The Changelog: Software Development, Open Source - Inside 2019's infrastructure for Changelog.com (Interview)

Episode Date: May 5, 2019

We're talking with Gerhard Lazu, our resident ops and infrastructure expert, about the setup we've rolled out for 2019. Late 2016 we relaunched Changelog.com as a new Phoenix/Elixir application and th...at included a brand new infrastructure and deployment process. 2019’s infrastructure update includes Linode, CoreOS, Docker, CircleCI, Rollbar, Fastly, Netdata, and more — and we talk through all the details on this show. This show is also an open invite to you and the rest of the community to join us in Slack and learn and contribute to Changelog.com. Head to changelog.com/community to get started.

Transcript
Discussion (0)
Starting point is 00:00:00 Bandwidth for ChangeLog is provided by Fastly. Learn more at Fastly.com. We move fast and fix things here at ChangeLog because of Rollbar. Check them out at Rollbar.com. And we're hosted on Linode cloud servers. Head to Linode.com slash ChangeLog. This episode is brought to you by Linode, our cloud server of choice. And we're excited to share the recent launch dedicated CPU instances.
Starting point is 00:00:22 If you have build boxes, CI, CD, video encoding, machine learning, game servers, databases, data mining, or application servers that need to be full duty, 100% CPU all day, every day, then check out Linode's dedicated CPU instances. These instances are fully dedicated and shared with no one else so there's no cpu steal or competing for these resources with other linodes pricing is very competitive and starts out at 30 bucks a month learn more and get started at lino.com slash changelog again lino.com slash changelog all right welcome back, everyone. This is the ChangeLog, a podcast featuring the hackers, the leaders, and the innovators of software development.
Starting point is 00:01:12 I'm Adam Stachowiak, Editor-in-Chief here at ChangeLog. On today's show, we invited Gerhard LeZou back on the show, a resident ops and infrastructure expert, to talk about the setup we rolled out for 2019. In late 2016, we relaunched ChangeLog.com as a new Phoenix Elixir application, and that included a brand new infrastructure and deployment process. 2019's infrastructure update includes Linode, Coral West, Docker, CircleCI, Rollbar, Fastly, NetData, and more. And we talked through all the details here on today's show. This show is also an open invite to you and the rest of the community to join us in Slack and to learn and to contribute to Change.com as you see fit.
Starting point is 00:01:51 So head to Change.com slash community to get started, and I'll see you in Slack. It is 2019. It's a new year. It is. Obviously. We're way into this year, as a matter of fact. Q2. But in terms of infrastructure, we're kind of new right this has been out for a few months now
Starting point is 00:02:10 gerhard's here jared's here we're talking about the infrastructure for changelog.com in 2019 it's changed since the initial time we did this show which was 254 back in june 2017 is that right yep that's right that's right. That's right. Almost two years. Almost two years. So new infrastructure. What's new? What's going on here? Everything is simpler than it was before. We've learned a lot. We've cut a lot of the fat that we had in the past. We've learned from our mistakes. We haven't made that many new mistakes, which is always good, but we have learned a lot. And one big difference is that not only we have embraced all our partners fully with this new infrastructure, but also we have blogged about it on time and we have shared
Starting point is 00:02:55 a lot more with the audience and with our users than the last time we did this. That is true. I like how you added the on time there. I almost feel like that was self-inflicted, not like Jared Nye is saying it's on time. There's a subtweet in there. Yeah. You did blog about it on time and very well, a very nice post,
Starting point is 00:03:12 which we'll of course include in the show notes. Possibly a lot of our listeners have read that post. I know a lot of our listeners two years ago enjoyed episode 254, and I was wondering why why and a few people told us i think that you know our infrastructure is um a nice example of a real world application it's not a large application it is a custom application but because we have at least gerhard and i with regards to the deployment have intimate knowledge of the needs and the decisions and all these things.
Starting point is 00:03:46 It's less Adam and I poking at a black box that somebody else owns and more us talking about a black box that we participated in creating. And so there's some real-world takeaways, we hope. So when we did this two years ago, Gerhard, why don't you just real quickly, you said lots of change, we've simplified. And we're going to focus, obviously, on the new infrastructure. But in order to talk about simplification, can you at least review the old way we deployed changelog.com before this new shiny?
Starting point is 00:04:15 Yes, of course. So two years ago, we were using Ansible to do a lot of the orchestration and a lot of the tasks running on our infrastructure. We were using Concourse for our CI, which we had to run ourselves. It was not a managed service. And this year, so there's no more Ansible, there's no more Concourse. Everything that runs is self-contained within Docker, within Docker Swarm. So we have services, everything is nicely defined. And the core of the changelog infrastructure is a Docker stack. And at the core of driving that infrastructure, previously Ansible, basically Gerard has replaced Ansible with a part of my friends, just a badass makefile. If I might just break our own rules here, Adam,
Starting point is 00:05:05 and go explicit for a second. A makefile that does all these things. It's kind of amazing. Yes, the makefile was an interesting twist to what we had before, because before we had to run all these commands and there were a couple of scripts which you had to remember.
Starting point is 00:05:21 And it felt a bit ad hoc. Having a single makefile, what it gives us is a single place to go and to look at all the things that we could do. So for example, if we just run Make in our repository, the changelog repository, we see all the targets and Make their call targets. So just to give you a couple of examples, Make SSH and we can SSH into hosts. Or, for example, make contrib, which allows someone that's cloned the repository for the first time to make contributions to changelog.com. And that's something which you did not have before. So there are a couple of very nice things that make life easy of developers, which are also operators.
Starting point is 00:06:03 The barrier of entry is very, very low, and it's just easy and nice to work with. Where did you get your make skills? Because I've opened make files. I've looked at them and thought, oh, wow. A lot of them, I think, are auto-generated from like AutoConf or something. But mostly it's like, I'm going to compile
Starting point is 00:06:20 the software using make, and that's what I'd call the core use case. But obviously obviously there's lots of things that you can do with make and you're doing a lot of them it almost is like a scripting tool where'd you get these these wizard make skills where'd you learn this stuff okay so there is a very good question and i was i was convinced you're going to ask me that actually uh one of my mates he was saying i will listen to the changelog show that will come out because I want to learn more about Make. So I was sure that someone will ask me this question.
Starting point is 00:06:52 I'm happy to be your pawn. Yeah, go ahead. Please answer it. So I didn't know Make until a couple of years ago. I mean, Make is this really old build tool. And when I say really old, I mean in the best possible way in that it has these decades of experience that went into it, a lot of sharp edges for sure, but there's a lot of good documentation and a lot of good examples as well as bad ones. But all put together, it's a mature tool. So I first picked up Make while joining the RabbitMQ team.
Starting point is 00:07:29 The RabbitMQ team had the entire build system is based on Make and on a tool called Erlang.mk. Erlang.mk is a tool that's built by Loic Hogwin, and he's also the creator of Cowboy, which is the web server that Phoenix uses behind the scenes, as well as a bunch of other projects. So Make is still very present in the world of Erlang and definitely in the world of RabbitMQ,
Starting point is 00:07:58 and that's where I picked up a lot of the Make skills. So is it a tool that you commonly go to? Was this out of the norm for you to reach to it for this particular task? Or is it just one of those second nature kind of like your hammer nowadays? I think it's a bit of both, to be honest. I used bash for a lot of things. And I had scripts all over the place. And what I've seen in Make was the potential to make all these different scripts saner. So I really like the fact how you can have Make targets that depend on other Make targets, and it has a certain composability that Bash scripts didn't have. It's also simple in that
Starting point is 00:08:42 it comes pre-installed on every system. I mean, on Mac, you just need the Xcode tools and you have Make. Okay, it's an old one. It's version 3. You want a newer Make to benefit from some of the newer features. But all in all, it's omnipresent. It's almost like Bash. It's everywhere.
Starting point is 00:08:58 You don't need to do anything special to get it. And I've seen the potential of just improving a little bit an old approach that I had for many years. And it worked well. So I could see it work on a big project, which has many dependencies, which has many things that have to happen. And I just got to appreciate it. Very cool. Well, let's, let's back up a step because maybe we've gotten ahead of ourself. And I am totally to blame here because i want to hop into the nerdy details of this makefile but we described a little bit of the deployment infrastructure that was previous there we talked about simplifying it but we haven't actually described what it is so you mentioned
Starting point is 00:09:38 cowboy is a erlang based web server that that Phoenix rides on, so to speak. Just to take the Phoenix metaphor, I guess. Does a Phoenix ride? I don't know. It's on top of Cowboy. So what is Change.com? So I can answer that one because I work here. Change.com is an Elixir-based Phoenix application, so it runs that technology stack. And it was previously deployed a couple years
Starting point is 00:10:08 back using what you said there gearheart ansible docker uh concourse ci and we can talk a little bit about you know each of these moving parts and one of the goals this time around was let's get rid of some of the moving parts let's let's uh slim it down if we can, make it simpler to run, simpler to understand. For me, as not the primary author of the infrastructure, and generally try to remove as many parts as we can, rely more on modern technologies as well as more on Linode,
Starting point is 00:10:40 their load balancers, for example. But the base of it is, it's a Phoenix, it's an Elixir-based application that uses a proxy, Nginx proxy. It has a Postgres database. It has requirements for some local file storage because that's how we upload our episodes
Starting point is 00:10:58 into the system and process them with ID3 tags and whatnot using FFmpeg. And so those are kind of the constraints that we're working inside of, what we're trying to actually deploy. And then the way we go about deploying is what we're changing and what we're trying to improve upon. So when you describe the new setup, you start off by saying that Change.com is a simple three-tier web app.
Starting point is 00:11:24 We've talked about the makefile. You mentioned that we've simplified removing Ansible, and we're using Docker Swarm. Can you go a little bit step-by-step through those tiers and through the way it looks, and then we can talk about some of the decisions that we made along the way and what we were trying to get accomplished.
Starting point is 00:11:39 But first of all, get a lay of the land, not as it was back in 2016, 2017, but as it is right now. So changelog, as Jared very well put, is this simple, well, it's more complex, but it's just like just a web application. It seems simple, right? It seems simple from the outside. Yeah, it does. It does a lot of things, a lot of heavy lifting. It does a lot of just processing, which I think would be very complicated if you were not using Phoenix and if you're not using the Erlang VM
Starting point is 00:12:13 and we can circle back to that. But it's just a web application in a nutshell, which needs a database. And it's 3T in that we have a proxy in front. It's a little bit more complicated than that because the proxy does a couple of things, but it's just your typical web application. So the core, this core like database, application, proxy, all is described as a Docker stack. So a Docker stack you can define your services that make up your stack. In this case we have a bunch of
Starting point is 00:12:54 services, there are a couple more that we define, but they form the core of changelog. The nice thing about this is that because we describe the core of changelog in a Docker stack, we can spin it up locally on a machine that runs a Docker daemon that is configured to run in swarm mode, which is very simple, just Docker swarm in it, and your Docker desktop all of a sudden runs in swarm mode where you can deploy Docker stacks. That allows anyone on the team to run the core of changelog. When I say the core of changelog, I mean everything except the IaaS bits. So anything that for example can only run in Linode, like a load balancer, a node balance in their case, or a block storage volume,
Starting point is 00:13:42 obviously you couldn't get that locally, but you can get everything that makes ChangeLog. And we can set up production, identical copies of our production locally because we describe everything declaratively in this Docker stack. So this description of what is ChangeLog before it existed in these Ansible tasks. And it was a bit difficult to understand. You couldn't go into one place and see it all. It was like a file over here and a file over there, and this thing had to run, and then that other thing had to run, and they had to run in sequence. And eventually you would get changelog with all these components, which was a bit more difficult to visualize and understand. So we have the Docker stack that describes what changelog is at its core. And then we have all the IaaS components that are around it.
Starting point is 00:14:36 As I mentioned, Node Balancer, it terminates SSL connections, gives us the high uptime. It's all managed by Linode and it's very nice to work with. That has a couple of benefits. For example, we can define multiple versions of changelog and we can just modify the balancer to route to different versions, which is when we do breaking upgrades, which hasn't happened yet, but I'm sure it will, it's a nice way of making them with minimal downtime with minimal impact on users. In front of the load balance, so we always had this, we had Fastly, which we use for CDN,
Starting point is 00:15:16 which basically caches all the content, has some nice features, for example, we get IPv6 for free, we get HTTP2 for free. So everything that's static benefits from these features, which are all Fastly features. And we just leverage them. It's much better than building yourself, right? That's for sure.
Starting point is 00:15:32 That's a very complicated problem to tackle yourself. And you definitely couldn't do it as a one-man operation. It's too complex, too many variables, too many things to worry about. Well, it seems like in this day, you may think that CDNs are pretty well known, but some people might just reach for block storage, for example, because if it's just assets, maybe like, well, I want to store this on my own, not think about robotic caching or all these other features like IPv6, for example, and the logging we have, the streaming logs and stuff like that. So it's interesting that some might just reach for block storage,
Starting point is 00:16:08 but in this case, a CDN really gives us some superpowers that we didn't have to build ourselves. Exactly. So the truth is that we still use block storage, but we use block storage, which is a Linode feature, which is completely like a pig to the CDN. All the CDN does, it will read some static files from somewhere, it doesn't matter where they come from, and then it will cache them. So that whether we use block storage or whether you use like a local volume, that doesn't really matter as far as the CDN is concerned. Once the file is in CDN, it's in the CDN. The CDN just makes sure that our content
Starting point is 00:16:47 is quickly accessible, MP3s, especially MP3s, through the entire world. So they get served from the edge locations rather than from the Lino data center where the change log application runs and where the source and the origin of all the media and static assets is. Which is really important for us because we literally have a worldwide audience. where the source and the origin of all the media and static assets is.
Starting point is 00:17:07 Which is really important for us because we literally have a worldwide audience. Like we have listeners in Japan, listeners in South America, listeners in Africa, New Zealand, where you're at. Not where you're at in New Zealand, but where you're at in London. You're in London, right? Yeah, that's right. That's right. You know, all over the US, North America. So, I mean, we literally have a global audience. And so we had to have a CDN to
Starting point is 00:17:25 truly respect. I think an average MP3 is around 100 megs, roughly. But still, yeah, that's a decent amount to pull down. And you wouldn't want to pull it from Dallas, Texas or, you know, New York or New Jersey or something like that. You'd want to pull it from your local pot so you can get it faster. Definitely. That's definitely a good thing to have. And in our case, the cherry on top is how we can just get IPv6 and HTTP2 with no effort on our part, simply
Starting point is 00:17:53 because our CDN provider does that. So that's great. Thank you, Fastly, for that. Quick question about the block storage. And this is like, does linode support this feature kind of question so maybe not asking the right folks but maybe gearhard you know um could you set up a block storage in like a cdn mode where like they'll basically distribute that
Starting point is 00:18:14 storage geographically and it could be like like a cdn block storage because that would that would be cool um i don't know whether Linode has that feature, to be honest. I'm more familiar with Google's cloud. I work a lot with Google's cloud, and I know that their storage volumes, they have different modes in which they run, but by default they have multi-region. So basically every write goes through to three separate, actually that's multi-region so basically every write goes through to three separate
Starting point is 00:18:45 actually that's multi-region sorry that's that's Google Cloud Storage which I'm thinking which is similar to s3 I'm sorry I was thinking about the volumes the gosh I forget what they're called now they're the persistent disks the persistent disks which you get they're actually replicated all rights are in three separate zones so it's one region, but three zones. There are the SSD drives that you tend to get, which is similar to the Linode's block storage. As far as I know, Linode doesn't have something similar to S3, which is more like object storage, which is what we would want to make sure that we are consuming an API when it comes to our storage. We're not consuming devices, which is what a block storage is. It just presents a volume,
Starting point is 00:19:29 you mount it, but it's not like on the physical machine. Right. And when you say physical machine, you mean virtual machine? In virtual machine, of course, yes. It's not on the virtual machine, yes. Physical virtual machine, yes. The physical virtual machine, yes. By default, though, S3 is not a CDN, though, right? You still have to layer on some CDN on top of even S3 if you were going to use S3 for object storage, for example. That is correct, yes. Right, CloudFront. But the difference between, for example, block storage and S3
Starting point is 00:20:01 is that S3, you consume it by an API, and you work with objects. You don't work with files. I mean, you can, but it's performance is not good. If you do that, they're like different modes in which you can use that, but they're like, I wouldn't say hacks, not classes either, but it's not how it's meant to be used. They're meant to put a file, get the file files you have operations like that yeah and maybe we can let's stop for a moment and i will explain a little bit because people might be wondering why are you not using an object storage and um
Starting point is 00:20:36 and gearhard you asked me this as well like why do we need local storage because in the age of 2019 or i guess when we started this in 2016 even then i had been using heroku for many years and they will not allow local storage of files right it has to be transient it has to be ephemeral and you store things in memcast or you store them uh in s3 or you know services and the reason is very lame and it's very simple is that it's easier for me to develop software against with a concept of local files it's easier for uploads it's easier for metadata it's easier for uh ffmpeg we just shell out to ffmpeg to do our id3 tags a feature that we need and one that i at the time and probably still to today would have slowed me down dramatically
Starting point is 00:21:26 to have to write an elixir based ID three tag thing now ID three v one is out there but there's no ID three v two library which maybe eventually will write anyways if you're out there and interested holler at us if you'd like to write something like that but I wanted to shout out to FFmpeg and pass it a file and let it do its thing that's a very simple and straightforward way for me to get things done and so that's just kind of like the the end the end user demanded at gerhard and the end user was me i said no we're just going to go ahead and use local file storage like like old curmudgeons and that's that decision has uh had its pros and cons through the years but here we are with block storage i have a little twist to that way i like to to what you said and i'd
Starting point is 00:22:14 like to offer like an alternative approach sure it's very important for anyone to take stock of how they work and what works for them. The worst mistakes which I've seen in my career is teams that choose something for what it could be or what they think it could be and not for what it is. So when you know what works for you and you know what you're comfortable with, rather than picking a tool for its merits, you need to think, well, how is this going to work for me? Which is a very difficult question to answer. So rather than going to, let's say, Kubernetes, right? Because everyone is using Kubernetes these days.
Starting point is 00:23:00 Rather than saying, we're going to use Kubernetes and that's the end of it, I was saying, we're going to use Kubernetes and that's the end of it, I was saying, well, hang on. What's the simplest thing that would improve what we have that would be better without, for example, picking a tool before we know whether it fits us? So you mentioned log storage. It's easy. It's comfortable. It might not be great, but it works for us. And I'm sure like the day will come when we will replace that with something else. But it was too big of a step. We had to make smaller steps and for example replacing the CI and changing the way we define changelog, like the core of changelog, that was an easier step to make, which made us
Starting point is 00:23:45 just made everything simpler and nicer to work with. And one day, I'm sure the time to replace block storage and the time to use Kubernetes will come, but it was not in this situation. This episode is brought to you by GoCD. With native integrations for Kubernetes and a Helm chart to quickly get started, GoCD is an easy choice for cloud-native teams. With GoCD running on Kubernetes, you define your build workflow and let GoCD provision and scale build infrastructure on the fly for you. GoCD installs as a Kubernetes native application,
Starting point is 00:24:26 which allows for ease of operations, easily upgrade and maintain GoCD using Helm, scale your build infrastructure elastically with a new Elastic Agent that uses Kubernetes conventions to dynamically scale GoCD agents. GoCD also has first-class integration with Docker registries, easily compose, track, and visualize deployments on kubernetes learn more and get started at go cd.org slash kubernetes again go cd.org slash kubernetes So simplification was one of our main goals.
Starting point is 00:25:07 Another goal that we had was to allow a better integration into GitHub full request flow for contributors with running the tests and integrating with a CI. Concourse CI does a lot of things really well. And Gerhard, you actually introduced it to me back then. And I enjoyed a lot of its benefits one of the things that doesn't do well was what I think is a rather simple feature uh just because of the infrastructure the the architecture of concourse ci which is really a power tool for
Starting point is 00:25:38 building your own pipelines is that this doesn't really integrate very well into that, the needs of GitHub for those pull request builds. And with that and other reasons, we've replaced Concord CI with CircleCI, this version around. Gerhard, walk us through that. CircleCI, what it's doing, how we got here, etc. So if I remember this correctly, your exact words were Concord is a black box. It's nice, but I don't understand it. Can we use something else?
Starting point is 00:26:14 That was pretty much it. Something like that. Yeah. Yeah. I mean, you didn't like the fact that we had to manage it ourselves, which was definitely a big drawback. It was a black box for sure. There was an integration between Ansible and Concourse to where as a person that was just trying to use it, I couldn't understand, like, is this an Ansible thing? Is this a Concourse thing? And do I change it here or there? And what do I run locally?
Starting point is 00:26:44 What do I run in concourse etc and i never actually grasped the entire mental model which is why one of the the major goals to simplify was to was to you know fit this inside my monkey brain concourse i i i completely agree with you concourse is a very complicated tool um and it's you can do some great things with it. For example, the entire RabbitMQ pipeline, and there are many pipelines, are run by concourse. And it's a beast, right? It's a beast.
Starting point is 00:27:14 And that's overkill for what changelog needed. Circle CI ticked a lot of boxes. And I think the most important box, which it ticked, at least from my perspective, was that they are our partners so if changelog is recommending that you know hey check them out check this service out it's it's a cool service how would we know if we're not using them yeah I mean on paper they are great but we're not using them we're not dogfooding them so we don't
Starting point is 00:27:43 know how great they actually are and when they have some scaling issues, when they have just growing pains, we're not aware of that. So it's very difficult to empathize with the people that we recommend to use this tool. And from my perspective, that was the biggest reason to switch to CircleCI. Manage less infrastructure, use and embrace our partners, and just see what the experience is like. Not only that, but because we are using CircleCI, our users can see how we use it and they can maybe try and mimic what we do as a starting point and then make it their own, obviously. CircleCI is a lot lot simpler in terms of what it does for us. I'm sure you can do some amazing things with CircleCI but we don't. The reason why we don't is because
Starting point is 00:28:33 switching from concourse, well the reality was I had to do a lot of things when it came to the switch and I was thinking okay what's the simplest thing to use CircleCI to do all the things that we need, but not the more complex bits? And are these things better suited, for example, for Docker? Can we define this as a loop? For example, the application updater. We have this service that runs in Docker that updates our application. So this has a couple of benefits. First of all, it decouples CI from production. So CI is not aware about production. All CI does is run tests, obviously run the builds, resolve all the dependencies, do the packaging, do the assets, and then it
Starting point is 00:29:16 pushes the resulting Docker image to Docker Hub. And that's it. That's where the CI stops. It doesn't have access to any secrets, to production secrets, which is great from a security perspective. And if, let's say, one day you wanted to try another CI system, it would be fairly easy because it does only a subset of the things that we needed to. And it's not tightly coupled with everything else. So moving from concourse to CircleCI was somewhat painful because we were putting more in CI than could have, should have been.
Starting point is 00:29:51 As you mentioned, moving some things to Docker, this application update loop, so on and so forth. Is there such thing as like, obviously it seems that way, but like CI lock-in where there's so many interesting tools in a platform and or open source ci for example even you know that you can kind of get locked into those particular platforms definitely i would definitely agree with that and we were somewhat locked into concourse because we had concourse do things that other cis cannot do as easily. I mean, sure, it's possible, right? Anything is possible, but at what cost? And in our case, we almost had to take a baseline,
Starting point is 00:30:31 and we had to use a CI for the things that we wanted, like Jared mentioned, PRs. Concourse wasn't good at PRs for us. It can do them, but they're not as first class as some of the other things. So PRs in CircleCI are very easy. They're so easy. And that was a nice feature, which we embraced.
Starting point is 00:30:52 But for example, orchestrating our infrastructure, I would not have put that in any CI, to be honest. Concord is a bit more than a CI. As Jared was alluding to to it's this automation platform almost it does so many things and some it does them really well but it's a very complex system it's not a CI
Starting point is 00:31:14 it's a lot more than a CI so CircleCI would just use it run our tests as I said and there's a pull request run our tests resolve all the dependencies package assets and then produce a build artifact. And that's it.
Starting point is 00:31:29 Walk me through a lifecycle then of what it means to be continuously built and monitored. So as we do local changes, we're fleshing out a new feature, we ship it to GitHub. Walk us through the lifecycle of what it means to be continuously deployed and then also monitored. So let's take the most common path, which is the one where someone pushes a change into master branch. So we're not creating any branches, we're pushing a small change to the master branch. When the master branch updates, CircleCI receives the update via webhook and it runs the pipeline. By the way, we can also define pipelines in CircleCI, which is very nice. The pipeline, all it does is the first thing, it resolves the dependencies,
Starting point is 00:32:16 makes sure that we have everything we need. Then it runs the tests. It compiles assets, any assets that we need. This is the CSS, JavaScript, images, and then it publishes an image to Docker Hub. We have this application update, which I mentioned, runs within our production system and it continuously checks to see if there is a new image. If there is a new image that was published to Docker Hub, it pulls that image and it will spin up a new instance of changelog. If this new instance passes all the checks, there are a couple of health checks which we define, then it gets promoted. So we have blue-green deploy. It gets promoted to master and to the live application.
Starting point is 00:33:11 And the old application just gets spun down. If there are any issues with a new application, for example, it doesn't pass its health checks, the deploy fails. And we get a notification by using Rollbar in our Slack. So whenever there's a deploy, whether it's a good one or a bad one, we always get notification in our slack viral bar. So you can see what gets deployed, who deploys it and when it was deployed, which by the way, you can participate in all this via change.com slash community free sign up, grab a slack invite free slack invite up into pound dev, free pound dev, all things free here. And you can see those roll bars flying in throughout the week.
Starting point is 00:33:48 And maybe you'll get annoyed of them and maybe you'll leave the room. But if not, you can hang out there and see the stuff that's being deployed. Just a quick pitch to come hang out with us in our community Slack. I'll add a little more to that. I think Garrett mentioned this earlier too, the invitation of being able to use some of these partners, but also being able to see CI. And you mentioned, Jared, too, about a real-world app
Starting point is 00:34:08 and being able to see how it's done in open source. It's not a very sophisticated app. It's pretty easy to jump in, but the invitation is there, really. I think that's what you're really saying here is that if you're out there listening to this and you're like, how in the world do they do this? And you're listening to this show, follow Jared's instructions. That's how you get in.
Starting point is 00:34:24 You're invited. Everyone's welcome. Come check it out, follow Jared's instructions. That's how you get in. You're invited. Everyone's welcome. Come check it out. See how it works. Ask questions. We love that. That's actually something which I picked up from you when it came to ChangeLog.
Starting point is 00:34:32 Not everybody that runs a business is as open as you are. So not only do you develop in the open of the entire application, but we share everything we learn, right? All the fixes. And I know that Nick, he's Nick Janitakis. He's one of our Phoenix Elixir fans and he's in the Dev Channel all the time.
Starting point is 00:34:55 And he keeps mentioning us on, what was it? Not Stack Overflow. Hacker News. Hacker News. That's the one. Thank you. Thank you, Jared. And so he's very excited to see that he can pick up a lot of the things that we learned.
Starting point is 00:35:10 And he can also get to share when he learns something. Because it's open source, he can do PR. And all of a sudden, our application is better. And everybody benefits. And that's the one thing which I loved about ChangeLog, how open it was. Jared, you were the first one that open sourced the entire new changelog.com application. I thought that was great. And then later on, I did the same for infrastructure, but because it was done so late, it made little sense to the users. I mean, they've seen this finished thing. While with this new infrastructure, the entire
Starting point is 00:35:40 code base, everything we do is in the same repository as the application. We use a monorepo, right? It's one happy party. Right. It's one happy party and everyone's invited. That's right. That's right. Come have fun. Come have fun. That's exactly right. And that's the one thing which I always appreciated, even love, right? The way we did things at ChangeLog. I really, really like that. You can even see what things we have in flight. For example, Raul Tambre, he wanted IPv6 support. And the reason why we did that is because he said, hey, I would like IPv6 support. And this is where we could start. And he even did PRs. And that was one of the nicest features to work with the community. It was really, really nice.
Starting point is 00:36:28 And actually, now that I think of it, PR246, where we made it easier for everybody to contribute to changelog.com, that's what started this new look at how we use Docker and how can we improve our use of Docker so that anyone can benefit from this simple approach. Yes, and the advantage of having an identical setup in development and production, which I've never been the beneficiary of. I always have slight differences.
Starting point is 00:36:53 And Gerhard, you and I had slight differences throughout this process as you were on Linux and I was on Mac and we got to iron out those kinks. But just having the exact same circumstance and even even today i still run the local version and then i'll kick over to the docker the local docker version because i still have my setup my old school setup um and realize you know have an actual clone so to speak of what production is and it's uh it's much better that way you You're converting me. You're converting me into a Docker fan. I'm very glad to hear that. And a little secret is that actually there are three layers.
Starting point is 00:37:35 So the first layer is your local dev setup where everything runs on a Mac or Linux and you need to install on your machine Erlang and Elixir and everything else. Then there's the contributor setup, which is the one that we did in PR246. And that's where we're using Docker Compose. You just make Contrib or Docker Compose up, and then Docker Compose orchestrates all the components. And there's a third approach, which is actually closest to production, where there's a local stack.
Starting point is 00:38:08 The reason why we use a local stack is because some things are different, like certificates, for example. We don't have SSL locally. We could, but it's not something that we run locally. Our SSL, by the way, is terminated by the load balancer. So that's something that even if we did have locally, it would not reflect production. Right. So they're like these three tiers.
Starting point is 00:38:27 I think you're between the first one and the second one. I don't think you're that used to running a local copy of production locally yet, but I think we're getting there. While we're talking about Docker, let's loop back to something that you said earlier, which has to do with running in a loop. I think one of the neatest little hacks, so to speak, little tricks that you pulled off is this self-updating Docker container.
Starting point is 00:38:50 So basically, as you pointed out, once CircleCI has a build, it publishes it to our Docker hub. And that's the entire application, right? And then on the production host, there is a Docker container whose entire purpose is to update the other Docker containers. So... Am I describing it correctly, or just the app container, basically? Just the app container, that's right.
Starting point is 00:39:18 It's called the application updater, and its only purpose is to update the application. But it does that... Is this custom code? Is this a solution you're using, or the application. Is this custom code? Is this a solution you're using? Is this standard practice? I just don't even know. I don't know what it is, to be honest.
Starting point is 00:39:33 I like his giggles before he answers on certain things. It's like he's revealing this thing, but it's not, obviously. He's waiting for us to ask that. These are actually all good questions, which I very much enjoy. I have to say that. So the gist of this application updater is literally a while loop and Docker service update. So it's a Docker service updates running in the while loop, which is like three lines
Starting point is 00:39:59 of code. That's it. That's how simple it is. And the reason why it can be so simple- Where does that code exist? That code exist in a Docker file or as part of the Docker stack? That's it. That's how simple it is. And the reason why it can be so simple... Where does that code exist? Does that code exist in a Docker file or as part of the Docker stack? Where does that actual code exist?
Starting point is 00:40:11 So there is a Docker file that builds the container, the container image, which runs this code. And the image gets deployed with the entire stack, part of the Docker stack, onto production. So you get the application, you get the database, you get the proxy, and you get the application updater. Right. So is the code for the application updater, is this Docker service update running in a loop? Is that code in the Docker file that then gets built as the image? Or does that code
Starting point is 00:40:46 actually exist in our code base? The code, the codes as with everything else exists in changelog.com. Right. It is a script. It is a script in changelog.com, which gets built into a Docker image. Okay. So it's a Docker, like for example, we have a Docker image for the application, which defines what the application is. We have a Docker image for the application, which defines what the application is. We have a Docker image for the proxy where all the legacy assets are stored. So we have a Docker image that has this application update to code,
Starting point is 00:41:14 which is a subset of what we have in the application, in the changelog repository. And it's literally just like a script, a simple script, which has a while loop. Okay, so which file in the repository? I can answer this one. Are we going to find this code Gerhard? Which file is it in? So in changelog.com, there's a Docker directory. Okay. Docker directory is the update service continuously file. That's the question I was asking. There you go.
Starting point is 00:41:47 Sorry, some questions are hard for me. No, no, no. Especially the simple ones. I'm mostly razzing you because it's just funny to ask you to continually drill down on the specifics, but also because I wanted to look at while you're talking. So I just continued to. There it is. Back. What's it called?
Starting point is 00:42:01 Something continue. Update service continuously. Boom. Okay. I'm with you. I got the file open. He's also linking this up in the blog post too, so that's why I'm tracking a little bit closer because when he talks in the blog post about the Docker service,
Starting point is 00:42:14 managing the lifecycle of the app, he points to the running loop via a link, which links to the code that you're talking about, Jared. That's exactly right. I'm staring at that word running in the loop, and I just never clicked on the link. Yeah, I did because I was curious, and I was like, what does this mean? that's exactly right. I'm staring at that word running in the loop and I just never clicked on the link. Yeah.
Starting point is 00:42:26 I did cause I was curious and I was like, what does this mean? That's all right. I think I got my money's worth with getting your heart to answer. It's better. Okay. And so it, is it basically pinging the Docker hub or is it just saying this Docker
Starting point is 00:42:41 service update? I guess right there, do Docker service update dash dash quiet. And dash dash image. Dash dash image specifies an image. If the SHA of the image changes, it will pull the new image and update the service. So you know how in Docker you have images and the images have tags?
Starting point is 00:43:00 So we always track latest. But latest is just a tag that points to a SHA, which is similar to a Git SHA that points to a unique commit. So the latest tag, when it points to a new SHA, then the Docker team knows, hang on, there's actually a new image, even though the tag hasn't changed, but the tag is pointing to a new image. It's almost like master, the master branch, which changes as you have new commits. The latest tag in Docker is exactly the same. It points to a place which changes over time. And when the CI updates the latest tag for the change log image, this loop and this Docker service update knows it has to pull the new image down and update the service which uses the image. So during that time period,
Starting point is 00:43:54 which is pulling down the new image and it's starting the new app container, do we have two app containers running simultaneously, the old one and then the new one? That's right. So who answers the phone when somebody calls? Internally in Docker, there is an IP, which is almost like a gateway, and any request coming in goes to the live app. And the live app is the one that's healthy and has passed all the health checks. The new app, as it starts up, it's not ready to serve requests. So it needs to come up and the health check checks whether port 4000, whether the response is a 200 response. When the new container, the new
Starting point is 00:44:41 app container, when that health check passes, the service knows to update the internal routing to point to the new app instance. It's all automatic. It's all managed by Docker. We don't have to do anything. We used to have lots of scripting that used to make this switch for us, which is still in the old infrastructure repository where all this was kept. It was complicated, it was custom. We're not using that anymore. We're just delegating to the Docker service update, which manages all this lifecycle for us.
Starting point is 00:45:12 Which is smart because we're getting it for free. We've had to write that ourselves before in Ansible concourse land. Exactly, exactly. And now we don't have to worry about that. It's all managed for us. And we will do something similar if we use another system. It's a property that's very desirable. This blue-green deploy never take production down until the new version is ready is very desirable. And we had it for a long time now, even though we did it ourselves, which was difficult to
Starting point is 00:45:40 understand, difficult to maintain. And because it didn't change, things didn't go wrong. But if things started going wrong, it would have been this code that's written a couple of years ago, and it is what it is. So now if it doesn't go right, instead of calling Gerhard, we call Solomon Hikes, or I guess he's moved on. We call Docker and say, what the heck, this health check isn't working, or this Dockercker service update
Starting point is 00:46:05 is failing is the is that the proper scale of the escalation now when we have issues is blame docker i think to a degree yes i mean that's that's the same thing like for example when something doesn't work with linode what do we do like we need to call in out or something doesn't work with fastly. Right. No, we call fastly. And so that's, I suppose the trade off of having someone else do something for you. But I think it's a price worth paying, knowing that you don't have to deal with any of the complexities that go into one thing.
Starting point is 00:46:38 And even though it might look very simple, I mean, if you look at some of our other scripts that we used, I mean, it's not that simple and a lot of things can go wrong. So it's a, it's a nice thing to delegate. Yeah, absolutely. Anytime we can pass the buck, let's, let's pass it right on. One last question while we're down here in the mucky muck of this updater is what's the circum, how does the circumstance work in which a comp, a slightly more complicated push, which also has some database migrations in it. So
Starting point is 00:47:07 how does the system do the update and some sort of manipulation of the database, which is on the block storage? Right. When the new application instance starts, it will run the database migration. And this is not optional, it always does it. If the database migration makes the database incompatible with the running application, the live application, the live application, it won't crash, but parts of it won't work. I mean, they may stop working. But this being Erlang and being Elixir, it's just like basically some processes will start crashing inside the Erlang VM. When it comes to, for example, I think this is mostly in the admin area because most of the website is static.
Starting point is 00:48:00 And once like what the users see, once we generate that static content, it doesn't go to the database. Right. So that is, this is definitely one way of, let's see, screwing production, right? So if you have a bad migration or something that does something, breaking change to the database, it would take your production down. But let me ask you this question. Okay. What would the alternative be?
Starting point is 00:48:31 The alternative would be if I have a bad migration, it would never promote that app container, except that we would have to have a separate database instance or something right so yep because you've already migrated the database so the app container doesn't really matter because the database is in an unknown state um the i guess the alternative would be roll auto rollback uh i know things get things get complicated quickly. I know that. And they get very complicated. And especially with a system like, exactly why we don't do that. Yes, exactly. It's something to be aware of. It's something that, you know, if it happens, it's bad luck, but you always need to be mindful of this
Starting point is 00:49:19 thing. And the alternatives are very costly, both time wisewise, both effort-wise, and do you need that complexity? So as I was mentioning earlier, I work on the RabbitMQ team where distributed stateful systems is the bread and the butter of what we do. Any sort of rolling migrations are extremely, extremely complicated. And that's why, like, how do you, for do you upgrade a RabbitMQ cluster? Most of the time, rolling upgrades work. But when we introduce breaking changes at a protocol level or at a database level, at a schema level, we recommend to deploy something on the side, like Blue-Green Deploys. And if we do that with something like PostgreSQL, imagine setting up a database copy, what happens with the writes that arrive to the data, which is like when database is running in
Starting point is 00:50:10 production. How do you migrate them? How do you basically move them to this new database instance? PostgreSQL is a single instance, not a cluster, which complicates things even further. So it's a complicated problem, which I don't think we need to solve, to be honest. This episode is brought to you by our friends at Rollbar. Move fast and fix things like we do here at Changelog. Check them out at rollbar.com slash changelog. Resolve your errors in minutes and deploy with confidence. Catch your errors in your software before your users do. And if you're not using Rollbar yet or you haven't tried it yet,
Starting point is 00:50:57 they want to give you $100 to donate to open source via Open Collective. And all you got to do is go to rollbar.com slash changelog, sign up, integrate Rollbar into your app. And once you do that, they'll give you $100 to donate to open source. Once again, rollbar.com slash changelog. We didn't go deep enough into monitoring in the last segment, so let's do that now. So we have Rollbar, we have Pingdom, and this new thing I didn't even know existed until literally minutes ago.
Starting point is 00:51:46 So if you look at netdata.changelog.com, this is visibility into our CPU, our RAM, our load, all sorts of interesting stuff. So what is NetData and kind of tail off the monitoring piece of how we run changelog? So NetData is definitely a component we didn't mention. It gives us visibility into system metrics. So what happens on the host, on the VM, on the Linux VM that runs the changelog application and the database and all the other components that make up changelog? I would have to mention logs as well. So logs and metrics, they kind of go hand in hand. When it comes to, actually there's one more component, which would definitely be the exceptions, the application exceptions, which I already mentioned Rollbar. By the way, we mentioned Rollbar to track errors as well,
Starting point is 00:52:28 application errors, application exceptions, and also to track deploys. So they kind of go together. Because if there's code in a deploy that's bad, you want to track it back to an error, etc. Exactly. And you want to see how often those errors happen, when they started happening,
Starting point is 00:52:43 when which deploy it was introduced, and so on and so forth. And Rollbar is really good at giving you that visibility. When it comes to logs and metrics, I mean, we mentioned this even two years ago, we aggregate all the logs from the entire infrastructure and we ship them to Papertrail. Papertrail now is together with Pingdom, they're part of SolarWinds, SolarWinds Cloud, they're like this nice observability stack. So the logs we ship them to Papertrail via LogSpout and metrics we delegate it to NetData which is this amazing open source product, free, right, we love free. NetData is completely free, completely amazing in that it gives you per second metrics. There are very few
Starting point is 00:53:30 monitoring systems, metric systems that give you that level of visibility. And not only we see the CPU, the network, we can see for example TCP sockets. And when we first introduced IPv6, the one thing which we noticed, this was on the old stack, by the way, we had a TCP socket leak. And it's something which NetData made very easy to see. So if you go into the pull request, which is, again, public, where we discuss, sorry, it's the issue that we'll discuss this IPv6 support. When we first introduced this, there was a leak, Raoul that requested the feature, he could see it and we could mention it and we could discuss around metrics. So we see very detailed system metrics and we can also see per container metrics. So we can see,
Starting point is 00:54:20 for example, the application container, how much CPU it's using, how much memory it's using. And it's all real time. It's all per second. And that means that we have real-time visibility, but for a limited duration of time. So currently, we only display metrics for the last hour, and that's it. And the reason why we do this is because the metrics are stored in memory. And even though we could give it more memory, we limit it to this one hour worth of metrics because we're low on memory well no we're not low on memory we could definitely i know i know that's why i say it that way so uh we're definitely not low on memory we have lots and lots of memory bucket loads of memory, but the more we store in net data, we could do with storing in another system which was built for historical metrics, which is Prometheus. So the Prometheus and Grafana, they also form part of an observability stack, which I'm
Starting point is 00:55:19 very excited about, and that's something which I'm hoping that we'll be able to do in the future, which would give us a long-term visibility. We'll see how things change over time in the entire changelog stack. So this net data, is it just running in another container on the host? Yes. And so if we eventually said, okay, it's time for Prometheus and Grafana, would you just set those up as other containers on the host? That's correct, yes. I'm learning things. That makes sense.
Starting point is 00:55:49 So, okay, long-term metrics coming out of Prometheus is a nice to have down the road. In the blog post, you also mentioned business metrics. I'm not familiar with these tools. I know we did a show on Prometheus probably three years ago, but that doesn't mean I remember any of it. And I'm here to tell you I don't. So give us the give us what are you talking about business metrics? How could we use this beyond just net data, but longer than a day? was downloaded and when it was downloaded. This is something that we could track like downloads, which could be like a rate of episodes. We could track them over time and we can aggregate all downloads across all episodes. That's obviously just one type of metric that we could have. We could also track when do users stop listening, for example, to mp3 files,
Starting point is 00:56:46 like how much of them they download. And we could store all these metrics alongside everything else in a system like Prometheus and then we'd use Grafana to visualize those metrics. So literally anything that you want to track long term, we could store it in Prometheus, which is like a metric storage system. And we could visualize it using Grafana, which is a metrics visualization system. So these are metrics that we care about, obviously. And we are currently doing that work, but we're doing it in application. So is it easier in Prometheus than the way that I'm doing it with my Elixir codes? I think it would be a matter of delegating that responsibility to something that was
Starting point is 00:57:30 built for metrics. Prometheus, for example, it's suited for metrics like high frequency metrics, lots and lots of metrics that continuously change. We could also use something like InfluxDB, for example, which is another system also for storing metrics. It has a slightly different target audience, and that might be better suited for business metrics, and that has maybe queries which are like a SQL query.
Starting point is 00:58:00 You can run SQL queries, which I think would be better suited for the business metrics. But I'm pretty sure that we can make Prometheus work for us for both types of metrics rather than having these separate metric system running. And I think that in FluxDB, I think only the core is free. I think there's like a paid version. I'm not sure on that because I've only used it a long time ago before it went all commercial. This was pre-version one era. I do use Prometheus every day. Actually, all the RabbitMQ metrics, there's a new feature coming out which will be using Prometheus heavily and Grafana heavily, and it's excellent for those types of metrics, system metrics.
Starting point is 00:58:49 The Phoenix application being an Erlang application, there's a lot of stuff that we could use for changelog itself, which maybe we all need that level of detail, but it's nice to know that we could do it if you wanted to. And it's already been done for us. We don't have to reinvent the wheel. We can just reuse something in this context.
Starting point is 00:59:05 So for now, I'm just dumping everything right into postgres and basically using good old sql to slice and dice it into things that we that we want to see that being said it's very manual i mean if we want to have a new view in fact i just added a view today today yeah today of the a graph of the uh of all episodes first seven days reach so the basically the launch data reach for episodes on a graph and it's like that's us that's a feature that i have to develop i would love to have a tool, maybe similar to Metabase, Adam, where we're dumping the information into some sort of raw storage and then it's sliceable and diceable in ways that are more ad hoc or more like a reporting tool. Is that Prometheus or is that Grafana or is that neither of those?
Starting point is 01:00:01 So Grafana would be able to visualize metrics and it has this concept of uh back end sorry data data sources so you could for example use grafana with influx dv with prometheus it even has even supports like stackdriver which is like a google product so it supports these different like storage metric. One of them is Prometheus. I would need to take a closer look at all your metrics. I'm very familiar with Prometheus, and I would know what it can and it can't do that well.
Starting point is 01:00:35 Most metrics, like I haven't come across a metric that Prometheus can't store or you can't use it for. Maybe, for example, InfluxDB would be more efficient. Now, do we need that? I don't know, maybe. I would definitely need to take a closer look at the metrics. But what I do know for sure is that if you are writing code that manages metrics, you would be better served using a system that was built for that and maybe writing code that is specific to your business needs. So in your case, like, for that, and maybe writing code that is specific to your business needs.
Starting point is 01:01:06 So in your case, like for example, those ID3 tags and those FFmpeg, I would be so happy if we could maybe switch to object storage and not use block storage for that type of media and for that type of static content, rather than maybe spend the time doing metrics related work. It's a bummer to have you build out features that could potentially serve more people than just us.
Starting point is 01:01:36 But also theoretically commit free because if as a user I have questions and I want to pull the data source without having to know SQL or have access to the server and I can do it just in the data source itself, it gives us more flexibility. And plus, as Gary just mentioned,
Starting point is 01:01:56 frees you up to do more high-value things. Right. Exactly. One thing which I would like to mention, and this is very relevant, actually two things. Grafana 6 came out, and it has this new amazing feature. When I say came out, it came out in February. So it's been, I suppose, two months out. It has this new feature where it allows you to explore metrics. So you're saying about having to write these queries or having to write these codes
Starting point is 01:02:21 to see metrics in a different way. Well, Grafana has this feature which allows you to explore metrics and play with metrics and just to see what data do you have, what metrics do you have, and which way you can combine them. So that's the one thing which is very cool. Obviously, you can build dashboards, and dashboards are more static where we can give a couple of examples, we can link them in the show notes. But the other feature which I'm very excited about is Loki. Loki is this new Grafana. It's part of the same stack, Grafana Labs, and it's for log aggregation.
Starting point is 01:03:00 So all of a sudden we can ship our logs to Loki, which manages them, and it shows them in the context of the metrics. So when we see some, for example, maybe our database is running slow, or our application is running slow, or it's crashing, or whatever may happen, we not only can see the metrics that correspond to those misbehaviors, we can also see the logs, which will give us more insight. So this combination is great to have. And not to mention, now that you have the business metrics in the same system, you can overlay the business metrics alongside your infrastructure metrics,
Starting point is 01:03:41 your application metrics, and your application logs. So you can see the impact that the database being down, for example, if it was to happen, what impact does that have on the audience or in shows or whatever it may be? And I don't mean just like short-term, I mean long-term impact. Yeah.
Starting point is 01:03:59 Maybe ignorance is bliss though, because once I find out, you know, what that line of code, what that bug is actually costing us in terms of listeners, might want to find a new career oh boy just kidding that that does sound pretty cool so you there's some other things that now i'm wanting you to to help us actually have a good first maybe test case for from medias we can talk about that maybe offline which is a simple metric that's a business metric that I'm not tracking yet and I want it to track, but I do not want to add it to our current Postgres setup.
Starting point is 01:04:30 So maybe that'd be a good one for Prometheus. What else? As we look to the future of change.com, we've made big strides, we've simplified, we've switched things out, we've decoupled a little bit from certain aspects of our stack. There's a lot that we didn't do. And one thing that always comes up and what Adam asks is why not Kubernetes? We asked this last time around. Let's go ahead
Starting point is 01:04:53 and ask it again. Why are we not using Kubernetes? Okay, go ahead. Why not Kubernetes? Why Kubernetes? Why not? Why not? We could be here all day, right? All night. Okay. So the simple answer to that is that Kubernetes two years ago was hard. Kubernetes only recently is Linode, for example, doesn't offer managed Kubernetes yet, but it's almost there. They're very, very close to having managed Kubernetes. What does that mean for us is that we get to use it, we don't get to worry about upgrades, about when things fail, and so on and so forth. Digital Ocean, for example, it already has a Kubernetes offering, managed Kubernetes offering, and that's great. So, you know, that's something maybe worth considering. But what we definitely do not want to do is worry about our Kubernetes deployment. We just want to use it. Two years ago, we would have to go to Google or some, you know, big vendor to get that.
Starting point is 01:05:58 Nowadays, DigitalOcean, Linode very shortly, and others have it, which is great for us. So managing Kubernetes yourself is a very difficult thing and requires dedicated resources. Is that why we said no to that? Oh, yes. Yeah. Oh, yes. And the learning curve is very steep and things are done in a certain way. It's just another layer of abstraction.
Starting point is 01:06:21 It's almost like we were using concourse, which is way too complex for what we needed, while CircleCI was good enough. As an analogy, we're using Docker, for example, and Docker Swarm, which is good enough for what we need. Kubernetes would be nicer. And because we have these managed Kubernetes offerings and managed Kubernetes services, it's something that we could definitely benefit. So if you're going to use Kubernetes, though, you want to go in a managed scenario rather than trying to run it yourself in most cases, unless you're extremely rich in terms of the business and endless resources.
Starting point is 01:06:58 Yes. And you need something even more specific, custom, for example, you need to build custom Kubernetes resources, which is like a world of its own, very complicated. But if that's what you need, that's what you need. In our case, we don't need that. Now, I have a secret to reveal. I've kept some of the best parts for last. Oh, please, share them.
Starting point is 01:07:19 All right, so with Linode, before, two years ago, we were using Ubuntu. And because we were using Ubuntu at the time, we had to manage Docker ourselves. And that was a pain. In this infrastructure, we're using CoreOS, which is container OS. And what that means is that that comes with Docker pre-installed. It has this nice feature of auto updates, which we don't use. We don't need to go into the details. By the way, all these are in our repository. We have all the reasons. We have discussions with the Linode team why we didn't do it and other alternatives. But the point is we don't manage Docker. Docker is a managed thing
Starting point is 01:07:57 for us and that's very nice. Any updates, anything like that, we don't have to worry about. We don't even have to worry about installing it. On Ubuntu, you need the first thing. If you want to use Docker, you have to install it. On CoreOS, we don't have to worry about that. So because we no longer need to manage Docker, all the things that we used to do in the old infrastructure, we no longer have to do. So a lot of that code is no longer relevant in the new world, which is really nice. Because we are running Docker in swarm mode, we have a single instance of Docker and we should have more for sure. And to do that we need to change a couple of things. For example,
Starting point is 01:08:39 right now when we provision the block storage in Linode, we do that using Terraform. By the way, use Terraform to manage everything. I didn't mention that earlier on, but it's a nice little thing to have and it's very simple as well. We love that. So rather than having Terraform manage these block storage volumes, we would need to use a plugin for Docker, which by the way, Linode definitely has and wrote for us, sorry, not for us, wrote for their users. And that was very nice to see, which would allow us to, for example, use Docker swarm. We have multiple nodes and as applications or as containers move from VM to VM, the volumes would move with it, which is very nice. And all this really is the core that sits in the managed Kubernetes wrapper. Because Kubernetes, there are all these components which give you higher level abstractions to something that runs containers. And in this case,
Starting point is 01:09:43 it's Docker. So you need Docker and, okay, you can replace it with something else, but you need something that runs those containers. And then you can use Kubernetes that gives you a high-level API that allows you to define things in a way that we do, but a bit more complex, where you can define the entire stack
Starting point is 01:10:03 and what it means for all these containers to communicate in all the networks and all the services, as I was mentioning earlier. Which project is that from, Linode? Is that KubeLinode or is that Linode Cloud Controller Manager? I'm not sure. Let me see. So on Linode, there is actually developers.linode.com forward slash Kubernetes, and they have thei the linode cli which you can use to create a kubernetes cluster that's what i was looking for okay yep which underneath i mean it just
Starting point is 01:10:36 deploys the type of image that we use for our vm but it does a couple more things. It sets, for example, the plugin that manages the block storage. It has other plugins or integrations with the Kubernetes components that integrate, for example, with the node balances. So we can define more through the Kubernetes API and manage less via Terraform. So in a way, this is a stepping stone to manage Kubernetes, but it's a smaller step rather than the bigger one, which would have taken more. Love it. Yeah, getting there sooner than later. I mean,
Starting point is 01:11:12 2016 infrastructure, now we're finally on Kubernetes. Great things, I guess. The sky's the limit now. Yep, pretty much. Maybe on the closing side of things, where are we lacking? We talk about the future of where we're trying to go. Obviously, we're not done. Typical software, it's never really done, is it? So we're always improving. But where are we currently not as optimized?
Starting point is 01:11:36 Say, maybe an SSL or HTTPS, things like that. Where are we lacking that we could be improving? So one thing which is always at my mind is... So one thing which I'm constantly thinking about is what are the things that I have to spend time on that is not automated. And one of those things is the stateful services which need updating. For example, PostgreSQL, to update it, it takes a lot of effort currently to do that. And when I say a lot of effort, I mean a couple of hours. I don't mean days, but still, it's effort that we should not spend. If you can define how the update should happen and what are the rules for the update. We have a cluster so we can have multiple PostgreSQL instances and we can have automatic rolling upgrades. That would be very nice to have. There's a PostgreSQL, there's Nginx, there's all these
Starting point is 01:12:38 components which are auxiliary to the app but are also part of the changelog stack. So that's one thing which I would definitely like to improve because it's still a manual process. We build Docker images and we would like to automate that aspect. The other one is HTTPS for sure and IPv6. We are almost there with IPv6. We have it enabled on Linode. We have the DNS entries. We also have it enabled on the CDN, so Fastly. We're already using their IPv6 feature. There are still some links in the blogs, for example. We have some images that we load them from S3 and we are not using the IPv6 URLs, which we should do, but that's a small thing. A slightly related thing is HTTPS. I'm going to say it's slightly related because once you're IPv6, you want to use HTTP2,
Starting point is 01:13:39 and to have HTTP2, you need to have HTTPS. So everything needs to be encrypted. To do that, we currently have a certificate that we manually have to renew, we manually have to install. When I say manually, we just have to put it in LastPass, we install all the credentials. And then when you run Terraform, it just gets set up, configured on the Node balance on the Linode side. It would be great if we could use Let's Encrypt, which I'm a big fan of, that gives you free SSL certificates. It's a great community effort, it's a great industry effort, and it's something that as open source champions we should definitely be using. It's one less thing to worry about. Does our certificate run out? It doesn't, by the way,
Starting point is 01:14:23 don't have to worry about that until 2020. But it'd be nice to have this automated SSL certificates for us via Let's Encrypt. You know, we could just throw the old SSL away and move away before 2020. We don't have to wait till June. Just saying. I know, I know. But we need to set up the integration with Let's Encrypt, right? So how do we use the Let's Encrypt certificate?
Starting point is 01:14:50 And it would be nice if, for example, Linode has this feature in their known balances. So this is where, for example, Linode can do the integration for us, right? And for obviously all their customers. And then everyone gets to benefit. So is that in place now or is that being worked on by linode well this was a feature request i just made it live in the show there you go i'm sure marquez is listening marcus not marquez marcus yeah that's what i marcus johansson yeah we've we've had good conversations while i came he's a great advocate inside of linode force as
Starting point is 01:15:23 well just to you know listen to the show, love what we do, and then obviously wants to see us thrive. So that's awesome. I have to say a lot of the questions and a lot of the things that we had to work out through Linode API and by using the integration which he helped build a lot of these components, we were in constant touch with him and he was a great, great Linode representative and Linode developer to help us with a lot of things and obviously improve things for Linode as well
Starting point is 01:15:53 than all other users. So SSL being improved, H2 being used, we got some things we had to improve upon. What about CDN? I know we're kind of late in the conversation on that. We had some weird slowdowns. Is that worth at all touching on? I mean, there's a lot of work went into that to make it faster.
Starting point is 01:16:13 And maybe it seemed like it should have been more straightforward, but it wasn't. Are you referring to the 503s? Yeah. That was very low level in the networking stack. It was like a layer two, layer three problem, which is very low level in the networking stack. There's like a layer two, layer three problem, which is very low level in how routing works and how packets get lost and routes aren't updated correctly and stuff like that. And that was in a period when we were transitioning between infrastructure. So this was happening still in our old infrastructure. And I think since our new infrastructure, the
Starting point is 01:16:41 problems mostly went away. But now that you mentioned that, Adam, one thing which I would like us to do, and this is somewhere where we somewhat disagree with Jared, is to cache more by the CDN. So that if the VM is down, for example, or like if Linode was to have like an issue in the data center, our static content that was cached would still be served. So changelog would still be served. So changelog would not go down.
Starting point is 01:17:11 As you said, too, most of the stuff we do is somewhat static, too. I mean, once it's out there, it's sort of done unless there's an update, which is fairly infrequent. I do not disagree that we should do that. I disagree. How do I say it i i realize the the costs of making that change versus the cost or the value of doing other things which are higher priority i agree with you that should happen it's it requires us to change the way that we're doing some of our programming in order to do that to go completely behind the cdn and so that's why i've i've delayed on it i don't in
Starting point is 01:17:45 principle i'm with you i even on speed like i would love to have all of our content delivered especially our content that is mostly static upon read i mean we write it and then it's published and then we are caching things in the app but getting them into the cdn i agree with you completely that being said the way that the app is built and the way that we like to keep things simple with the ability to customize responses for signed in users, it just requires some heavy lifting to enable to make that particular change. And so there's just lower hanging fruit is my, that's my contention. So this is a great example, Jared, of understanding the landscape really well, and knowing which steps are worth taking. So I think the CDN is very similar to my Kubernetes
Starting point is 01:18:34 in that I understand the value. I know that we should do it. But there are a couple of other steps, smaller steps, easier step that we should take first. And I think we have been, I think we have been with CDN, things have been improving, as Adam was mentioning. We're not there yet, not for like not wanting it, but we understand the complexity that goes with that change. And this is the value of having a team and being a team and working together with people who have different domain expertise or technology expertise is that we, we have a dialogue and we have a push and a pull and we can disagree and we
Starting point is 01:19:12 can state our cases and we can move together for, you know, we move forward together with our collective knowledge and experience versus just making all, all the decisions yourself, which is why I reached out to you in the first place, Gerhard a couple couple years ago. I was like, this journey would be a lot better if I had somebody with me who had more expertise in this scenario.
Starting point is 01:19:30 Yes, I could trudge through and figure it out. It wouldn't be anywhere near as good. And it would have taken way longer to get deployed. But the long-term benefits is like, we bring other things to the team, right? And we build and grow together. And so that's, as a person who works alone a lot, it's just an enjoyable aspect of ChangeLog
Starting point is 01:19:52 that I think is an example here with some of our minor disagreements around what is a higher priority? Adam, you have your opinions on what we should be doing next and these are things that we discuss along the way. I think what's interesting, too, is I've learned so much more about deployment where it seems like it's fairly easy. Oh, just put the code on the server, and there you go.
Starting point is 01:20:14 It runs, right? It's so much more behind the scenes to having performant production code that's monitored, that has failover, has all these things that are concerns for a modern app that not everybody really deeply understands or considers. And I thought, you know, deploying is pretty easy, right? But clearly it's not. It's a lot.
Starting point is 01:20:37 It's very sophisticated. I think that's the way it should be for most users of the system and for most developers. But someone somewhere needs to worry and needs to solve the hard problems. And the more experience you gain in a certain area, the more you realize, well, actually, this is a lot more complicated than meets the eye. And if you don't like that type of work, it's really hard to do good work and to solve tough problems for users. And that's why some systems fail in weird and wonderful ways because they have a lot of sharp edges.
Starting point is 01:21:10 People haven't thought long enough or hard enough, or they haven't done the way that needs to be done for things to be easy. So when we consume, for example, Linux services or Google services via an API or AWS, we say, oh, that's easy. But there's an immense amount of work that goes behind the scenes that most users aren't aware of. And that's the way it should be. Yeah. How about this?
Starting point is 01:21:33 We'll turn it back on the listening audience too, because I'm sure there's many, many more opinions out there that we're not hearing. So if you've heard this conversation and you've read this blog post and you've examined our source code and have trudged through issues and different PRs and have, you know, just got some different insights that you see that we can be taking different steps, help us plot out a roadmap, join the community,
Starting point is 01:21:55 change.com slash community, pound dev is where things are happening, pound SRE is there. I just learned about that one today. It's pretty interesting. So if you want to chat with us or share feedback on new tooling, new services that we should put on a roadmap, please reach out. As with all things, we fly by the mantra of slow and steady wins the race. So we're in no rush to get there. We launched our latest platform in 2016, so we've improved upon it every year since then, basically. Thank you to Gerhard and Jared's hard work for making that possible. But we invite you, the community,
Starting point is 01:22:31 to share your thoughts as well. So join us in Slack. Join us in issues on GitHub or even discussions on podcasts. It's pretty easy. I would really like that too, I have to say. I would really enjoy to hear some feedback from the users having put all this work out and having
Starting point is 01:22:48 made everything available. I would really appreciate knowing what you think, having different viewpoints. I always want to learn and I'm sure there are better ways or different ways of doing things that I would like to know about.
Starting point is 01:23:04 Yeah, absolutely. And all are welcome. Please come. As Jared mentioned, the three frees, go to change.com, sign up for free, get in slime for free and hang out and pound dev for free. It's all free around here. Enjoy the party. We love it. Gerhard, Jared, thank you so much for all your hard work on, on change.com. It's tremendous how far far we've come i can't even believe we were once on tumblr wow man that's crazy and today such a different world and and uh just shows that uh we're true to our motto slow and steady wins the race so keep pushing forward thanks fellas great show thanks gerhard thank you both thank you bye alright thank you for tuning in to this episode of the changelog hey guess what
Starting point is 01:23:50 we have discussions on every single episode now so head to changelog.com and discuss this episode and if you want to help us grow this show reach more listeners and influence more developers do us a favor and give us a rating or review in iTunes or
Starting point is 01:24:05 Apple Podcasts. If you use Overcast, give us a star. If you tweet, tweet a link. If you make lists of your favorite podcasts, include us in it. And of course, thank you to our sponsors, Linode, GoCD, and Rollbar. Also, thanks to Fastly, our bandwidth partner, Rollbar, our monitoring service, and Linode, our cloud server of choice. This episode is hosted by myself, Adam Stachowiak, and Jared Santo. And our music is done by Breakmaster Cylinder. If you want to hear more episodes like this, subscribe to our master feed at changelog.com slash master. Or go into your podcast app and search for ChangeLogMaster. You'll find it.
Starting point is 01:24:46 Thank you for tuning in this week. We'll see you again soon. Because you've listened all the way to the end of the show, got a little preview here for you of our upcoming podcast called Brain Science. This podcast is for the curious and explores the inner workings of the human brain to understand behavior change, how about formation, mental health, and the human condition. This show is hosted by myself, Adam Stachowiak, and my good friend, Muriel Reese, a doctor in clinical psychology. It's Brain science applied.
Starting point is 01:25:25 Not just how does the brain work, but how do we apply what we know about the brain to better our lives? Here we go. That applied brain science really stood out to me because I want I don't want it to just be data. I want you to go. How can this fit? What can I take away now? How am I going to change? And that that sort of is where you come in more. And even some of the questions like, so like, I want to ask you, what are some of the most
Starting point is 01:25:48 challenging things working in the tech world when it comes to relationships? Probably the most important one is isolation. More and more of the world and companies are being, for good reasons, they're being okay with what they call distributed teams. Yeah. And that means that you and I, we work for the same company, but you work from your home office. I work from my home office.
Starting point is 01:26:08 I might go into the office a couple times a week if I live local. But even if I live in San Francisco, I'm still probably a remote worker, even though I can hop in an Uber or hop on the train or whatever and go into the office and be there in a half hour. But why waste the time? You know, and this is where I would revisit what I want to talk about with resonance. And that whenever we're learning, no matter what thing, it's really helpful when we get feedback that's both immediate and specific.
Starting point is 01:26:37 And so when you're by yourself and you don't have any interaction with other people, how can you get any feedback? I mean, you're losing most of the nonverbal communication and you also don't have all of the voice inflections or facial expression. Have you ever tried to be sad, feel sad and smile at the same time? Try it. It's pretty hard. Right, because facial expression is exactly what's involved when it comes to empathy, which is relationships.
Starting point is 01:27:13 I was reading a research article recently and it talked about, you know, how couples who are together a really long time end up sort of looking like each other. I've heard that. Yeah. And so what they've looked at are together a really long time end up sort of looking like each other. Yeah. And so what they've looked at is when we actually empathize with other people, facial expression is really key within that.
Starting point is 01:27:35 And so when you empathize with the partner you're with over and over and over again, your face begins to make the same creases and facial expression as it relates to where somebody else is emotionally. Wow. Right? Say it again. So that's, that's creepy. Well, they've, again, this is sort of the hotbed when it comes to neuroscience these days is mirror neurons. And these mirror neurons are what are involved with empathy. And so mirroring, meaning I get another person's emotional world. And so one of the research studies looked at Botox. And what they found is that Botox, because it actually assists in paralyzing facial muscles, then you can't contort your face
Starting point is 01:28:25 so you don't get wrinkles. But actually levels of empathy go down. Uh-uh. Right. Because your physical appearance can't reflect your inner appearance. Yeah, you got it. And so when you're working in these remote locations,
Starting point is 01:28:40 it might facilitate better work or more focus. And it allows people to be distributed and to capitalize on the talents across the country right yeah wow so that's like a treasure trove in my opinion talking about in a scientific way you know not just like hey this is my opinion yeah about all the cons of that because i think what we can do is still have remote work, but do it in more healthy ways. Because I'm fully, I mean, I've been self-employed remote worker since 2006. Now I'm a unique animal.
Starting point is 01:29:14 I know that. My wife knows that. Right. I'm fine with it. I'm a good human being, but I've got some flaws. And I'm willing to accept and share those to some degree. And I think the problem is we just lack maybe a more purposeful or intentional feedback loop. Yeah. Which I think is super important to being able to operate in this world in just good ways. I don't know, healthy ways is probably the best way to use in this show context is healthy ways.
Starting point is 01:29:42 One of the things that's fundamental, I would say, to being human is change, right? And so sometimes people come in and are really key in our life for a period of time, and then things change. Either we grow or they grow or they change in a different direction, and then the relationship changes or that feedback loop gets modified in some way. That isn't always a bad thing. It's just going, my sense of choice actually is a critical component when it comes to feeling good about my life. If I feel like everything is sort of outside of me and I don't have any charge over it, like I didn't choose to work in a more remote location or I didn't choose to go to school or I didn't choose this person. Then it feels far more oppressive as opposed to I actually participated in the outcome that I'm actually experiencing.
Starting point is 01:30:35 So I then also have more charge over whether or not I want to change it. I think this feedback loop process that we're talking about here is super common to developers, they have this concept of agile, and basically it means you produce something, you put it out there, and you expect the feedback loop to happen in order to gain insights and course correction to then release another version of it that continually and iteratively becomes more and more improved. So this whole process in day-to-day work in software is normal. And I think it's interesting how it can apply to their lives and people's lives, you know,
Starting point is 01:31:31 to take the same importance of a feedback loop, for example, and apply it. Right. Well, so this is very much how it goes in relationship, which is why there is an importance when it comes to sort of things resonating. You ever walk into a room or an interaction with a couple other people and like something just feels wonky or off? You're like, I can't put my finger on it. Definitely been there. Right? Well, and so to be able to identify that in relationships and even go,
Starting point is 01:32:01 wow, I need to, I'm experiencing this person in my world with the limited interactions that I have with them. It hasn't really resonated with me. And so I don't get good feedback. So now I'm going to be more defensive because I feel as though there's a threat. It doesn't necessarily mean the person is threatening. However, my brain is going to tell me, hey, we need to be more protective. We need to do some strategies so that you're not fully exposed. One way I look at scenarios like this, I would say as of late, is because if you ever watched a TV show or a movie where the narration, the storytelling part of it, they expose a character in a certain light. And you may dislike that. They may be a villain or villainess, right?
Starting point is 01:32:50 Sure. But the moment they turn the story to their backstory and why they are the way they are or why they're acting the way they're acting. Yes. You then kind of fall in love with them. You're almost rooting for them. Right. I feel like that's the same thing that happens day to day to our lives is that there are people who seem villainous or not for us, but we don't understand their backstory and why they are the way they are for us to have and employ that empathy that's required to have this dance, as you say, this iteration of relationship, you know, we just assume they are who they are and we project, you know, our worst fears onto them and they become true.
Starting point is 01:33:31 Yes, you got it. This is why in the absence of, you know, a face, I don't really get to engage with people in the same sort of humanness that we are all in. And so you're exactly right. I mean, over and over and over again, because you can identify and go, oh, that's why they're harsh. Or, you know, I recently had an interaction I had shared with someone that I was a competitive gymnastics coach for a number of years. And so somebody thought that my response to them when they were really struggling was kind of harsh, but they remembered that I had told them I was a coach for so long. And they're like, oh, this is just another side of her coming out. Right.
Starting point is 01:34:16 And I'm not sure I prefer it, but I get it. And then it switched for their reaction because then they're like, oh, wait, we're on the same team. She's not trying to, like, oppress me or fight back against me. She actually is helping me, trying to get me to where I want to go. My wife and I, we've learned this concept of goodwill, right? Yeah. I can take your feedback or your criticisms in a different light if I know that you have goodwill for me, meaning that you're not trying to harm me,
Starting point is 01:34:48 that you are for me, not against me. And sometimes change, as we all know, is painful and can be painful. So sometimes the necessary feedback and or criticism that can influence that change can also be painful, but I can accept it differently.
Starting point is 01:35:02 If I know that she or they or whomever is in the scenario with me has goodwill for me you know whereas if you know that they're not for you then you obviously take it a whole different way and that's that's an okay thing but we often are you know in relationship with people that are giving us crucial feedback and we need to have that kind of that lens. Like it was significant in our marriage to understand, hey, I know there are times when you give me feedback. I am not happy about it, but I know you have goodwill for me. So therefore, I calm down.
Starting point is 01:35:36 I listen. I take that in and I process it, whatever. But I take it in a different way because I know that she's for me and not against me. Yep. One of the key things when it comes to change is a sense of openness and even relationally, like of going, I need to be able to see how somebody else responds or how they're feeling as based on their perspective of what they're going through and not just my perspective of their perspective. And so this goodwill is like, I believe that we're on the same side
Starting point is 01:36:13 and that you're not trying to make it harder for me. But so I can understand if I were sitting where you were sitting, had the background that you had, why you would have taken it in that way. And then I can provide an opportunity to clarify or create more connection, even when it doesn't feel good. And I honestly think this is so much of what's missing in people's relationships. If I look at relational interactions through the notion of conditioning, wherein I get a sort of hit of dopamine, feel good feelings because I went to a person,
Starting point is 01:36:49 I had a conversation that didn't necessarily feel good, but there was openness on both parties to hear one another's perspective that it actually then reinforces like, oh, when I go and I have this exchange with people, I feel better. So now I'm going I go and I have this exchange with people, I feel better. So now I'm going to go and engage with other people and get the feedback, even if I might not like the feedback, because now I'm buffered
Starting point is 01:37:14 and I'm not alone in this and somebody else sees my world. That's a preview of Brain Science. If you love where we're going with this, send us an email to get on the list to be notified the very moment this show gets released. Email us at editors at changelog.com. In the subject line, put in all caps, BRAIN SCIENCE with a couple bangs if you're really excited. You can also subscribe to our master feed to get all of our shows in one single feed. Head to changelog.com slash master or search in your podcast app for ChangeLog Master. You'll find it. Subscribe, get all of our shows and even those that only hit the master feed. Again, changelog.com slash master. Thank you. you

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.