Screaming in the Cloud - Corey Screws Up Logstash For Everyone with Jordan Sissel

Episode Date: September 29, 2021

About JordanJordan is a self proclaimed “hacker.” Links:Twitter: https://twitter.com/jordansissel ...

Transcript
Discussion (0)
Starting point is 00:00:00 Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud. This episode is sponsored in part by you, Gabite. Distributed technologies like Kubernetes are great, citation very much needed,
Starting point is 00:00:39 because they make it easier to have resilient, scalable systems. SQL databases haven't kept pace, though. Certainly not like no SQL databases have, like Route 53, the world's greatest database. We're still, other than that, using legacy, monolithic databases that require ever-growing instances of compute. Sometimes we'll try and bolt them together to make them more resilient and scalable, but let's be honest, it never works out well. Consider UgaByteDB. It's a distributed SQL database that solves basically all of this. It is 100% open source, and there's no asterisk next to the open on that one,
Starting point is 00:01:18 and it's designed to be resilient and scalable out of the box so you don't have to charge yourself to death. It's compatible with PostgresSQL, or Postgresqueel as I insist on pronouncing it, so you can use it right away without having to learn a whole new language and refactor everything. And you can distribute it wherever your applications take you, from across availability zones to other regions or even other cloud providers should one of those happen to exist. Go to yugabyte.com, that's Y-U-G-A-B-Y-T-E dot com, and try their free beta of Yugabyte Cloud, where they host and manage it for you.
Starting point is 00:01:53 Or see what the open source project looks like. It's effortless distributed SQL for global apps. My thanks to you, Gabite, for sponsoring this episode. This episode is sponsored in part by our friends at VMware. Let's be honest, the past year has been far from easy due to, well, everything. Caused us to rush cloud migrations and digital transformation, which of course means long hours refactoring your apps, surprises on your cloud bill, misconfigurations, and headaches for everyone trying to manage disparate and fractured cloud environments. VMware has an answer for this. With VMware's multi-cloud solutions, organizations have the choice, speed, and control to migrate and optimize
Starting point is 00:02:37 applications seamlessly without recoding, take the fastest path to modern infrastructure, and operate consistently across the data center, the edge, and any cloud. I urge you to take a look at vmware.com slash go slash multicloud. You know my opinions on multicloud by now, but there's a lot of stuff in here that works on any cloud. But don't take it from me. That's vmware.com slash go multi-cloud, all one word. And my thanks to them again for their sponsorship of my ridiculous nonsense. Welcome to Screaming in the Cloud. I'm Corey Quinn. I've been to a lot of conference talks in my life. I've seen good ones, I've seen terrible ones, and then I've seen the ones that are way worse than that. But we don't tend to think
Starting point is 00:03:23 in terms of impact very often about how conference talks can move the audience. In fact, that's the only purpose of giving a talk ever, to my mind, is you're trying to spark some form of alchemy or shift in the audience and convince them to do something. Maybe in the banal sense, it's to sign up for something that you're selling, or to go look at your website, or to contribute to a project, or maybe it's to change the way they view things. One of the more transformative talks I've ever seen that shifted my outlook on a lot of things was at scale in 2012. The person who gave that talk is my guest today, Jordan Cissell, who is the, among many other things in his career,
Starting point is 00:04:01 was the original creator behind Logstash, which is the L in Elkstack. Jordan, thank you for joining me. Thanks for having me, Corey. I don't know how well you remember those days in 2012. It was the dark times. We thought, oh, the world is going to end. That wouldn't happen until 2020. But it was an interesting conference full of a bunch of open source folks. It was my local conference because I lived in Los Angeles. And it was the thing I looked forward to every year because I would always go and learn something new. I was in the trenches in those days. And I had a bunch of
Starting point is 00:04:32 problems that looked an awful lot like other people's problems. And having a hallway track where, hey, how are you solving this problem was a big deal. I miss those days in some ways. Yeah, scale was a particularly good conference. I think I made it twice. Traveling down to LA was infrequent for me, but I always enjoyed how it was a very communal setting. They had a dedicated hallway tracks. They had kids tracks, which I thought was great because folks couldn't usually come to conferences
Starting point is 00:05:01 if they couldn't bring their kids or they had to take care of that stuff. But having a kids track was great. They had kids presenting. It felt more organic than a lot of other conferences did. And that's kind of what drew me to it initially. Yeah, it was my local network. It turns out that the Southern California tech community is relatively small and we all go different lives.
Starting point is 00:05:21 And it's L.A., let's face it. I lived there for over a decade. Flaking is a way of life. So yeah, oh, we'll go out and catch dinner. Ooh, half the flake at the last minute. If you're one of the good people, you tell people you're flaking instead of just no showing, but it happens.
Starting point is 00:05:33 But this was the thing that we would gather and catch up every year. And oh, what have you been doing? Wow, you work at that company now. Congratulations slash what's wrong with you? It was fun, just sort of a central sync point. It started off as hanging out with friends. And in those days, I was approaching the idea of, you know what? I should learn to give a conference talk someday. But let's be clear. People don't give conference
Starting point is 00:05:56 talks. Legends give conference talks. One day, I'll be good enough to get on stage and give a talk to my peers at a conference. Now, the easy cynical interpretation would be, well, then I saw your talk and I figured, hey, any jackhole can get up there. If he can do it, anyone can. But that's not at all how it wound up impacting me. You were talking about Logstash, which, let's start there, because that's a good entry point. Logstash was transformative for me. Before that, I'd spent a lot of time playing around with syslog. Usually our syslog, but there are other stories here. When a system does something and it spits out logs, ideally, how do you make sure you capture those logs in a reliable way? So if you restart a computer, you don't wind up with a gap in your
Starting point is 00:06:40 logs. If it's the right computer, it can be a gap in everything's logs while the thing's coming back up. And that's avoid single points of failure in the rest. And I had done all kinds of horrible monstrosities. And someone asked me at one point, yeah, someone said, well, there are a couple of options. Why don't you use Splunk? And the answer was, is that I don't have a spare princess lying around that I can ransom back to her kingdom. So I can't afford it. Okay, what about Logstash? And my answer was, what's a Logstash? And thus that sound was Pandora's box creaking open. So I started playing with it and realized, okay, this is interesting. And I lost track of it because we have demands
Starting point is 00:07:15 on our time. Then I was dragged into a session that you gave and you explained what Logstash was. I'm not going to do nearly as good of a job as you can on this. What the hell was Logstash for folks who are not screaming at syslog while they first hear of it? All right, so you mentioned rsyslog, and there's old is often a pejorative of more established projects, because I don't think these projects are bad. But rsyslog, syslog-ng, things like that were common to see for me as a sysadmin. But to talk about Logstash, we need to go back a little further than 2012. So the Logstash project started... I disagree because I wasn't aware of it until 2012, until I become aware of something.
Starting point is 00:07:56 It doesn't really exist. That's right. I have the object permanence of an infant. That's fair. And I've always felt like perception is reality. So if someone, this gets into something I like to say, but if someone is having a bad time or someone doesn't know about something, then it might as well not exist. So Logstash as a project kind of started in 2008, 2009. I don't remember when the first commits landed, but it was, it was, gosh, it's more than 10 years ago now. But even before that, in college, I was fortunate to, through a network of friends, get a job as a sysadmin. And as a sysadmin, you stare at logs a lot
Starting point is 00:08:32 to figure out what's going on. And I wanted a more interesting way to process the logs. I had taught myself regular expressions and I wasn't finding joy in it at all. Like pretty much most people probably, either they look at regular expressions and I wasn't finding joy in it at all. Like pretty much most people probably either they look at regular expressions and just evacuate with disgust, which is absolutely an appropriate response, or they dive into it and they have to use it for their job, but it's not, it wasn't enjoyable. And I found myself repeating stuff a lot,
Starting point is 00:09:01 matching IP addresses, matching strings, URLs, just trying to pull out useful information about what is going on. Oh, and the timestamp problem too. One of the things that I think people don't understand who have not played in this space is that all systems do have logs, unless you've really pooched something somewhere. And it shows that at this point in time, this thing happened. As we start talking about multiple computers and distributed systems, but even on the same computer, great. So at this time, there was something that showed up in the system log because there was a disk event or something. And at the same time, you have application logs that are talking about what the application running is talking about. And that is ideally using a
Starting point is 00:09:39 somewhat similar system to do this, but often not. And the way that timestamps are expressed in these are radically different. And the way that the log files themselves are structured, one might be timestamp followed by host name, followed by error code. The other one might be host name followed by timestamp in a different format, followed by a copyright notice because a big company got to it, followed by the actual event notice. And trying to disambiguate all of these into a standardized form was, first, obnoxious, and secondly, very important because you want to see the exact chain of events. This also leads to a separate sidebar on making sure that all the clocks are synchronized, but that's a separate story for another time. And that's where you enter the story in many respects.
Starting point is 00:10:21 Right. So my sort of thought around what led to Logstash is you could take a sysadmin or a software IT developer, whatever, expert, and you can sit them in front of a bunch of logs and they can read them and say, that's the time it happened. That's the user who caused this action. This is the action. But if you try and abstract and step away, and so you ask, how many times did this action happen? When did this user appear? What time did this happen?
Starting point is 00:10:47 You start losing the ability to ask those questions without being an expert yourself or sitting next to an expert and having them be your keyboard. Kind of a phenomenon I call the human keyboard problem, where you're speaking to a computer, but someone has to translate for you. And so in around 2004, I was super into Perl. No shocker that I enjoyed ish. I sort of enjoyed regular expressions, but I was super into Perl. And there was a Perl module called regex colon colon common, which was a library of regular expressions to match known things, IP addresses, certain kinds of timestamps,
Starting point is 00:11:26 quoted strings, and whatnot. And this stuff is always challenging because it sounds like, oh, an IP address. One of the interview questions I hated the most, someone asked me was, write a regular expression to detect an IP address. It turns out that to do this correctly, even if you bound it to IPv4 only, the answer takes up multiple lines on a screen. Oh, for sure. It's enormous. It's like a full page of code you can't read. And that's one of the things that it was sort of like standing on the shoulders of the person who came before. It was kind of a epiphany to me. Yeah. So I can copy and paste that into my code, but someone has to maintain that thing after I get fired is going to be, what the hell is
Starting point is 00:12:05 this? And what does it do? It's like, it's the blessed artifact that the ancients built it and left it there. Like it's a Stargate sitting in your code. And it's, we don't know how it works. We're scared to break it. So we don't even look at that thing directly. We just know that we put nonsense in and IP address comes out and let's not touch it ever again. Exactly. And even to your example, even before you get fired and someone replaces you and looks at your regular expression, the problem I was having was I would have this library of copy and pasteable things, and then I would find a bug, an edge case, and I would fix that edge case. But the other 15 scripts that were using the same
Starting point is 00:12:40 regular expression, I can't even read them anymore because I don't carry that kind of context in my head for all of that syntax. So you either have to go back and copy and paste and fix all those old regular expressions, or you just say, you know what, we're not going to fix the old code. We have a new version of it that works here, but everywhere else this edge case fails. So that's one of the things that kind of drew me to the regex colon colon common library in Perl was that it was reusable and things had names. It was, I want to match an IP address. You didn't have to memorize that long piece of text too precisely and accurately, accept only regular expressions and reject things that are not. You just said, give me the regular expression that matches an IP. And from that library gave me the idea to
Starting point is 00:13:25 write grok. Well, if we could name things, then maybe we could turn that into some kind of data structure. The sort of the combination of I have a piece of log data. And I as an expert, I know that's an IP address, that's the username, and that's the timestamp. Well, now I can apply this library of regular expressions that I didn't have to write and hopefully has a unit test suite and say, now we can pull out instead of that plate piece of text. Now that is hard to read as a non-expert. Now I can have a data structure. We can format however we want that non-experts can see. And even experts can just relax and not have to be full experts all the time using that part of your brain. So now you can start getting towards answering search-oriented questions. How many login attempts happened yesterday from this IP address?
Starting point is 00:14:13 Right. And back then, the way that people would do these things was elastic search. So that's the thing you shove all your data into in a bunch of different ways, and you can run full-text queries on it, and that's great. But now we want to have that stuff actually structured. And that is sort of the magic of Logstash, which was used in conjunction with Elasticsearch a lot. And it turns out that typing random SQL queries into command line is not generally how most business users like to interact with this stuff. So it needs to be something dashboard-y like, and the project that folks used for that was Kibana. And ElkStack became a thing because Elasticsearch in isolation can do a lot, but it doesn't get you all the way there for what people were using to look at logs.
Starting point is 00:14:51 You're right. And Kibana is also one of the projects that Elastic owned. And at some point, someone looks around like, oh, Logstash, people are using that with us an awful lot. How big is the company that built that? Oh, it's an open source project run by some guy. Can we hire that guy? And the answer is apparently because you wound up working as an Elastic employee for a while. Yeah, it was kind of an interesting journey. So in the beginning of Logstash in sort of 2009, I kind of had this picture of how I wanted to solve log processing search challenges. And I broke it down into a
Starting point is 00:15:23 couple of parts of visualization. To be clear, I broke it down into a couple of parts of visualization. To be clear, I broke it down in my head, not into code, but visualization kind of exploration. There's the processing and transmission, and then there's storage and search. And I only felt confident really attending to a solution for one of those parts. And I picked log processing, partly because I already had a jumpstart from a couple of years prior working on Grok and feeling really comfortable with regular expressions. I don't want to say good because that's... You heard it here first. We found the person that knows regular expressions. And Logstash was being worked on to solve this problem of taking your data, processing it, and getting it somewhere. That's why Logstash has so many outputs, has so many inputs, and lots of filters.
Starting point is 00:16:09 And about, I think, a year into building Logstash, I had experimented with storage and search backends, and I never found something that really clicked with me. And I was experimenting with Lucene, and knowing that I could not complete this journey because the problem space is so large, it would be foolish of me to try to do distributed log storage or anything like that, plus visualization. I just didn't have the skills or the time in the day. I ended up writing a front end for Logstash called Logstash Web. Naming things is hard. And I wasn't particularly skilled
Starting point is 00:16:46 or attentive to that project. And it was more of a very lightweight front end to solve the visualization, the exploration aspect. And about a year into Logstash being alive, I found Elasticsearch. And what clicked with me from being a sysadmin and having worked at large data center companies in the past is I know the logs on a single system are going to quickly outgrow it. So whatever storage system will accept these logs, it's got to be easy to add new storage. And Elasticsearch's first day promise was it's distributed. You can add more nodes and go about your day. And it fulfilled that promise. And I think it still fulfills that promise that if you're going to be processing terabytes of data, yeah, just keep dumping it in there. That's one of the reasons I didn't try and even use MySQL or Postgres or other data
Starting point is 00:17:36 systems, because it didn't seem obvious how to have multiple storage servers collecting this data with those solutions for me at the time. It turns out that solving problems like this that are global and universal lead to massive adoption very quickly. I want to get this back a bit before you wind up joining Elastic, because you get up on stage and you talk through what this is. And I mentioned at the start of this recording that it was one of those transformative talks, but let's be clear here. I don't remember 95% of how Logstash works. Like, the technology you talked about 10 years ago is largely outmoded slash replaced slash outdated today. I assure you, I did not take anything of note whatsoever from your talk regarding
Starting point is 00:18:17 regular expressions, I promise. Good. But that's not the stuff that was transformative to me. What was, was the way that you talked about these things. And it was the first time I'd ever heard the phrase that if a new user has a bad time, it's a bug. This was 2012. The idea of empathy hadn't really penetrated into the ops and engineering spaces in any meaningful way yet. It was about gatekeeping.
Starting point is 00:18:40 It was about read the manual, fool, if people had questions. And it was about read the manual, fool, if people had questions. And it was actively user hostile. And it was something that I found transformative of, forget the technology piece for a second. This is a story about how it could be different. Because Logstash was the vehicle to deliver a message that transcended far beyond the boundaries of how to structure your logs, or maybe beyond the boundaries of regular expressions. I'm never quite sure where those things start and stop. But it was something that was actively transformative where you're on stage as someone who is a recognized authority in the space, and you're getting up there and you're sending an implicit message, both explicitly and by example,
Starting point is 00:19:19 of be nice to people, demonstrate empathy. And that left a hell of an impact. Thank you. I wound up doing a spot check just now. And I wound up looking at this, and sure enough, early in 2013, I wound up committing, it's still in the history of the change log for Logstash because it's open source. I committed two pull requests, 10 minutes apart, two submissions. I don't know if pull requests were even a thing back then, but it wound up in the log. Because another project you were renowned for was FPM, F'n Package Manager. Is that what the acronym stands for, or am I misremembering?
Starting point is 00:19:54 We'll go with that. I'm sure vulgar viewers will know what the F stands for, but you don't have to say it. It's F'n Package Management. Yeah. But yeah, I think I really do believe that if a user, especially if a new user has a bad time, it's a bug. And that came from many years of participating at various levels in open source, where if you came at it with like a tinkerers or a hacker's mindset, and you think this project is great, I would like it to do one additional thing.
Starting point is 00:20:25 And I would like to talk to someone about how to make it do that one additional thing. And you go find the owners or the maintainers of that project. And you come in with gusto and energy and you describe what you want to do. And first they say, what you want to do is not possible.
Starting point is 00:20:42 They don't even say they don't want to do it. They frame the whole universe against you. It's not possible. Why would you want to do is not possible. They don't even say they don't want to do it. They frame the whole universe against you. It's not possible. Why would you want to do that if you want to make that do it yourself? You know, none of these things are an extended hand, a lowered ladder, an open door, none of those. It's always, you're bothering me, go away. Please read the documentation and see where we clearly, which they don't document that this is not a thing we're interested in and i kind of came to the conclusion that any future open source or collaborative work that i worked on it's got to be from a place where you're welcome and whatever contributions or participation levels you choose are okay.
Starting point is 00:21:27 And if you have an idea, let's talk about it. If you're having a bad time, let's figure out how to solve it. Maybe the solution is we point you in the right direction to the documentation if documentation exists. Maybe we find a bug that we need to fix. The idea that the way to build communities is through kindness and collaboration, not through walls or gatekeeping or just being rude. And I really do think that's one of the reasons Logstash became so successful. I mean, any particular technology could have succeeded in the space that Logstash did, but I believe that it did so because of that one piece of framework
Starting point is 00:22:06 where if a new user has a bad time, it's a bug. Because to me, that opens the door to say, yeah, you know what? Some of the code I write is not going to be good. Or the thing you want to do is undocumented. Or the documentation is out of date. It told you a lie and you followed the documentation and it misled you because it's incorrect, we can fix that. Maybe we don't have time to fix it right now. Maybe there's no one around to fix it. But we can at least say, you know what, that information is incorrect, and I'm sorry you were misled. Come on into the community and we'll figure it out. And one of the patterns I know is on the IRC channel, which is where the Logstash real-time community chat,
Starting point is 00:22:49 I don't know how to describe that. No, it was on Freenode. That's part of the reason I felt okay talking to you. At that point, I was volunteer network staff. This was before Freenode turned into basically a haven for Nazis this past year. Yeah, it was still called Lilo. LiloNet?
Starting point is 00:23:02 No, the Open Freedom Network. That predates me. This was, yeah, Lilo had died about six years prior. Oh, all right. But Reno's been around a long time. What made this thing work was that I was network staff, and that means that I had a bit of perceived authority. It's a chat room, not really.
Starting point is 00:23:17 But it was one of those things where it was at least, okay, this is not just some sketchy drive by Rando, which I very much was, but I didn't present that way. So I could strike up conversations. But with you talking about this stuff, I never needed to be that person. It was just someone wants to pitch in on this. Great. More hands make lighter work. Sure. Yeah, for sure. And for me, the interesting part is not even around the Logstash aspect so much. It's your other project, FPM. Well, one of your other projects. Back in 2012, that was an interesting year for me. Another area that got very near and
Starting point is 00:23:46 dear to my heart in open source world was the SaltStack project. I was contributor number 15, and I didn't know how Python worked. Not that I do now, but I can fake it better now. And Tom Hatch, the guy that ran the project before it was a company, was famous for this, where I could send in horrifying levels of code. And every time he would merge it in, and then 10 minutes, there'd be another patch that comes in that fixes all the bugs I just introduced. And it was just such a warm onboarding. I'm not suggesting that approach, and I'm not saying it's scalable, but I started contributing. And I became the first Debian and Ubuntu packager for SaltStack, which was great. And I did a terrible job at it
Starting point is 00:24:27 because, let me explain. I don't know if it's any better now, but back in those days, there were multiple documentation sources on the proper way to package software. They were all contradictory with each other. There was no guidance as to when to follow each one. There was never a, you know nothing about packaging, here's what you need to know step by step. And when you get it wrong, they yell at you. And it turns out that the best practice then to get it formally accepted upstream, which is what I did, is do a crap ass job. And then you'll wind up with a grownup coming in like, this is awful, move, and then they'll fix it and yell at you and gatekeep like hell. And then you have a package that works and gets accepted upstream because the magic
Starting point is 00:25:09 incantation has been said somewhere. And what I loved about FPM was that I could take any random repo or any source tarball or anything I wanted, run it through with a single command, and it would wind up building out a RPM and a DEB file, and I don't know what else it supported. Those are the ones I cared about, that I could then install on a system. I could put a repo and add that to a sources list on systems and get it to automatically install. So I could use configuration management, like SaltStack, to wind up installing custom local packages. And oh my God, did the packaging communities for multiple different distros hate you. And specifically what you had built, because
Starting point is 00:25:52 this was not the proper way to package. How dare you solve an actual business problem someone has instead of forcing them to go to packaging school where the address is secret and you have to learn that. It was awful. It was the clearest example that I can come up with of gatekeeping. And then you're coming up with FBM, which gets rid of user pain. And I realized that in that fight between the church of orthodoxy of this is how it should be done and the you're having a problem, here's a tool that makes it simple. I know exactly what side of that line I wanted to be on. And I hadn't always been previously. And that is what clarified it for me. Yeah. FPM was a, was like a really delightful
Starting point is 00:26:30 enjoyment for me to build. The origins of that was I worked at a company and they were all, I think at the time we were RPM based. And then as folks tend to do, I bounced around between jobs almost every year. So I went from one place that... Hey, it's me. Right? And there's absolutely nothing wrong with leaving every year or staying longer. It's just whatever progresses your career in the way that you want and keeps you safe and your family safe.
Starting point is 00:26:57 But we were using RPM and we were building packages already not following the orthodoxy. A lot of times, if you ask someone to build like a package for Fedora, they'll point you at like the maximum RPM book. And that's a lot of pages. And honestly, I'm not going to sit down and read it. I just want to like take a bunch of files, name it and install it on 30 machines with Puppet. And that's what we were doing.
Starting point is 00:27:22 Q one year later, I moved to a new company and we were using Debian packages and they're the same thing. What struck me is they are identical. It's a bunch of files and don't pedant me about this. It's a bunch of files with a name with some other sometimes useful metadata
Starting point is 00:27:41 like other names that you might depend on. And I really didn't find it enjoyable to transfer my knowledge of how to build RPMs and the tooling and the structures and the syntaxes to building Debian packages. And this was not for greater publication. This was, I have a bunch of internal applications I needed to package and deploy with, at the time it was Puppet. And it wasn't fun. So I did what we did with Grok, which was codify that knowledge to sort of reduce the burden.
Starting point is 00:28:10 And after a few, probably a year or so of that, it really dawned on me that a generality is all packaging formats are largely solving the same problem. And I wanted to build something that was solving problems for folks like you and me, sysadmins who were handed a pile of code and they needed to get into production. And I wasn't interested in formalities or appeasing any priesthoods or orthodoxies about
Starting point is 00:28:42 what really, you know, you should really shine your package with this special wax kind of thing. And because all of the documentation for Debian packages, Fedora packages are often dedicated to those projects. You're going to submit a package to Fedora so that the rest of the world can use it on Fedora. That wasn't my use case. I've built a thing and the thing that I built is awesome and i want the world to use it so now i have to go to packaging school not just once but twice and possibly more that's awful or more yeah and it's tough this episode is sponsored in part by our friends at jellyfish so you're sitting in your office chair bleary-eyed parked in front of a powerpoint and oh, my sweet feathery
Starting point is 00:29:25 Jesus, it's the night before the board meeting, because of course it is. As you slot that crappy screenshot of traffic light-colored Excel tables into your deck, or sift through endless spreadsheets looking for just the right data set, have you ever wondered why is it that sales and marketing get all this shiny, awesome analytics and insight tools, whereas engineering basically gets left with the dregs? Well, the founders of Jellyfish certainly did. That's why they created the Jellyfish Engineering Management Platform, but don't you dare call it JEMP. Designed to make it simple to analyze your engineering organization, Jellyfish ingests signals from your tech stack, including JIRA, Git, and collaborative tools. Yes, depressing to think of those things as your tech stack,
Starting point is 00:30:10 but this is 2021. And they use that to create a model that accurately reflects just how the breakdown of engineering work aligns with your wider business objectives. In other words, it translates from code into spreadsheet. When you have to explain what you're doing from an engineering perspective to people whose primary IDE is Microsoft PowerPoint, consider Jellyfish. That's jellyfish.co and tell them Corey sent you. Watch for the wince. That's my favorite part. And this gets back to what I found. It was rare that I could find a way to contribute to something meaningfully. And I was using Logstash after your talk. I'd started using it and rolling it out somewhere. And I discovered that there wasn't a Debian. And the thing is, is you would never frame it this way. But the answer was, of course, pull request welcome, which is often an invitation
Starting point is 00:31:10 to do free volunteer work for companies. But this was an open source project that was not backed by a publicly traded company. It was some guy. And of course, I'll pitch in on that. And I checked the commit log on this for what it is that I see. And sure enough, I have two commits. The first one was on Sunday night in February of 2013. And my commit message was initial packaging work for dev building. And sure enough, there's a bunch of files I put up there and it's great. And my second and last commit was 12 minutes later saying, remove large binary because I'm foolish. Yeah. Is that you? Oh yeah, I'm sure. Yeah, it was great. I didn't know GoGetWorked back then. I'm sure it's still in the history there. I wonder how big that binary is and exactly how much I have screwed people over
Starting point is 00:31:58 in the last decade since. I've noticed this over time and every now and then you'd be, I would be, or someone would be on a slow internet connection, which again is something that we need to optimize for, or at least be aware of and help if where we can. Someone would be cloning Logstash or on an airplane or something like that, or rural setting. And they would say, get stuck at 76% for like 10 minutes. And you would go back and dust off your tome of how to use Git because it's
Starting point is 00:32:25 a very difficult piece of software to use. And you would find this one blob, and I never even looked at who committed it or whatever, but it was like, I think it was 80 megs of a jar file or a Debian package that was the Logstash release. That's such a small world that you're like, yep, that was me. Oh yeah, oh yeah. Let's check this just for fun here. To be clear, the entire repository right now is 167 megs. So that file that I had up there for all of 13 minutes lives indelibly and gets history and it is fully half of the size
Starting point is 00:32:59 of the entirety of the Logstash project. All right then, I didn't realize this was one of those confess your sins episodes, but here we are. Look, sometimes we put flags on the moon. Sometimes we put big files in Git. You could, just for posterity, we could go back and edit the history and remove that, but it never became important to do it.
Starting point is 00:33:19 It wasn't loud. People weren't upset enough by it, or it didn't come up enough to say, you know what, this is a big file. So it's there. You left your mark. You know, we take what we can get. It's an odd time. I'll have to do some digging around. I'm sure I'll tweet about this as soon as I get a little more, a bit more data on it. But I wonder how often people have had frustration caused by like, there's no ill intent here to be very clear, but it was instead a, I didn't know how Git worked very well. I didn't know what I was doing in a lot of respects. And sure enough,
Starting point is 00:33:50 in the fullness of time, some condescending package people came in and actually made this right. And there is a reasonable responsible package now because surprise, of course there is, but I wonder how much inadvertent pain that caused people by that ridiculous commit. And it's the idea of impact and how this stuff works. Like I'm not happy that people on a plane with slow connection had to wait an extra minute or two to download that nonsense. It's one of those things that is, oops, I feel like a bit of a heel for that. Not for not knowing something, but for causing harm to folks. It's intent doesn't outweigh impact. There is a lesson in there for it. Agreed. On that example, I think one of the things,
Starting point is 00:34:27 code is not the most important thing I can contribute to a project, even though I feel very confident in my skills and programming in a variety of environments. I think the number one thing I can do is listen and look for sources of pain. And people would come in and say, I can't get this to work. And we would work together and figure out
Starting point is 00:34:44 how to make it work for their use case. And that could result in a new feature, a bug fix get this to work. And we would work together and figure out how to make it work for their use case. And that could result in a new feature, a bug fix, or some documentation improvements, or a blog post or something like that. And I think in this case, I don't really recall any amount of noise for someone saying, cloning the Git repository is just a pain in the butt. And I think a lot of that is because either the people who would be negatively impacted by that weren't doing that use case they were downloading the releases which were as small as we could possibly get them or they were editing files using the github online edit the file thing which is a totally acceptable it's perfectly fine way to do
Starting point is 00:35:23 things in Git. So I don't remember anyone complaining about that particular file size issue. The Elasticsearch repository is massive. And I don't think it even has binaries. It just has so much more. Someone accidentally committed their entire production test data set at one point, and oopsie doozy. Yeah, it is not the most egregious harm I've ever caused. Yeah, it's there. The thing that I guess resonates with me and still does is the lessons I learned from you, I could sum them up as being not just empathy driven, because that's the easy answer, but the other layers were that you didn't need to be the world's greatest expert in a thing in order to credibly give a conference talk.
Starting point is 00:36:04 To be clear, you were miles ahead of me and still are in a lot in order to credibly give a conference talk. To be clear, you were miles ahead of me and still are in a lot of different areas. And that's fine, but you don't need to be the, like, you are not the world's greatest expert on empathy, but that's what I took from the talk and that's what it was about. It also taught me that things you can pick up from talks and other means are, there are things you can talk about in terms of technology and there are things you can talk about in terms of people and the things about people do not have expiration dates in the same way that technology does and if i'm going to be remembered for impact on people versus impact on technology for me there's no contest and you forced me to really think about
Starting point is 00:36:38 a lot of those things and it started my path to i guess becoming a public speaker and then later all of the rest that followed like this podcast the the nonsense on Twitter and all the rest. So it is, I guess, we can lay the responsibility for all that at your feet. Enjoy the hate mail. Ah, my email address is now closed. I'm sorry. Exactly. Well, I appreciate the kind words. We'll get letters on this one. It's the impact that people have. And sometimes, I don't think you knew at the time that that's the impact you were having.
Starting point is 00:37:05 It matters. I agree. I think a lot of it came from, how do I want to experience this? And it was much later that it became something that was really outside of me in the sense that it was building communities. One of the things I learned shortly after, or even just before joining Elastic, was how many folks were looking to solve a problem, found Logstash, became a participant in the community. And that participation could just
Starting point is 00:37:30 be anything, just hanging out on IRC, on the mailing list, whatever. And the next step for them was to get a better paying job in an environment they enjoyed that helped them take the next step in their career. Some of those people came to work with me at Elastic. Some of them started to work on the Logstash team. At some point, they decided, because a lot of Logstash users were sysadmins, and on the Logstash team, we were all developers. We weren't sysadmins.
Starting point is 00:37:57 There was nothing to operate. And a lot of folks would come on board, and they were like, you know what? I'm not enjoying writing Ruby for my job. And they could take the next step to transition to the support team or the sales engineer team or cloud operations team at Elastic. So it was really, like you mentioned, it has nothing to do with the technology of, to me, why these projects are important. They became an amplifier and a hand to pull people up to go the next step they need to go.
Starting point is 00:38:27 And on the way, maybe they can make a positive impact in the communities they participate in. If those happen to be FPM or Logstash, that's great. But I think I want folks to see that technology doesn't have to be a grind of getting through gatekeepers, meeting artificial barriers and things like that. The thing that I took to is that I gave a talk in 2015 or 16, which is strangely appropriate now, terrible ideas in Git. And yes, checking large binaries in is one of the terrible ideas I talk
Starting point is 00:39:01 about. It's Git through counter example. And around that time, I also gave a talk for a while on how to handle a job interview and advance your career. Only one of those talks has resulted in people approaching me even years later saying that what I did had changed aspects of their life. It wasn't the get one. And that's the impact it comes down to. That is the change that I wanted to start having because I saw someone else do it and realized, you know, maybe I could possibly be that good someday. Well, I'd like to think I made it on some level. I'm proud of the impact you've made. And I agree with you. It is about people. Even with FPM, where I was very selfishly tickling my own itch, I don't want to remember all of this stuff. And I also enjoy operating outside of the boundaries of a church or, you know, whatever the priesthoods that say this is how you must do a thing. I knew there was a lot of folks who worked at jobs and they didn't have authority and they had to deploy something.
Starting point is 00:39:58 And they knew if they could just package it into a Debian format or an RPM format or whatever they needed to do, they could get it deployed and it would make their lives easier. Well, they didn't have the time or the energy or the support in order to learn how to do that. And FPM brought them that success where you can say, here's a bunch of files, here's a name, poof, you have a package for whatever format you want. Where I found FPM really take off is when Jam and Python and Node.js support were added. The sysadmins were kind of sandwiched in between two impossible worlds where they are only authorized to deploy a certain package format,
Starting point is 00:40:37 but all of their internal application developer teams were using Node.js and newer technologies and all of those package formats were not permitted by whoever had the authority to permit those things at their job. But now they had a tool that said, you know what, we can just take that thing. We'll take Django in Python and we'll make it an RPM and we won't have to think a lot about it. And that really, I think, to me, my hope was that it de-stresses that sort of work environment where you're not having to do three weeks of brand new work every time someone releases something internally at your company. You can just run a script that you wrote a month ago and maintain it as you go. Wouldn't that be something?
Starting point is 00:41:20 Ideally, ideally. Jordan, I want to thank you for not only the stuff you did 10 years ago, but also the stuff you just said now. If people want to learn more about you, how you view the world, see what you're up to these days, where can they find you? I'm mostly active on Twitter, at Jordan Cissell, all one word. Mostly these days, I post repair or repair stuff I do on the house. I'm a stay-at-home full-time dad these days. And I'm still doing maintenance on the projects that need maintenance,
Starting point is 00:41:51 like FPM or XdoTool. So if you're one of those users, I hope you're happy. If you're not happy, please reach out, and we'll figure out what the next steps can be. But yeah, if you like bugs, especially spiders, or if you don't like spiders and you want to like spiders, check me out on Twitter. I'm often posting macro photos, close-up photos of butterflies, bees, spiders, and the like. And we will, of course, throw links to that in the show notes.
Starting point is 00:42:17 Jordan, thank you so much for your time today. It's appreciated. Thank you, Corey. It's good talking to you. Jordan Cissell, founder of Logstash, and currently blissfully not working on a particular corporate job. I envy him some days. I'm cloud economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice. Whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry comment that in which you have also embedded a large binary. If your AWS bill keeps rising and your blood pressure is doing the same,
Starting point is 00:42:57 then you need the Duck Bill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duck Bill Group works for you, not AWS. We tailor recommendations to your business, and we get to the point. Visit duckbillgroup.com to get started. this has been a humble pod production stay humble

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.