The Changelog: Software Development, Open Source - Inside 2019's infrastructure for Changelog.com (Interview)
Episode Date: May 5, 2019We're talking with Gerhard Lazu, our resident ops and infrastructure expert, about the setup we've rolled out for 2019. Late 2016 we relaunched Changelog.com as a new Phoenix/Elixir application and th...at included a brand new infrastructure and deployment process. 2019’s infrastructure update includes Linode, CoreOS, Docker, CircleCI, Rollbar, Fastly, Netdata, and more — and we talk through all the details on this show. This show is also an open invite to you and the rest of the community to join us in Slack and learn and contribute to Changelog.com. Head to changelog.com/community to get started.
Transcript
Discussion (0)
Bandwidth for ChangeLog is provided by Fastly.
Learn more at Fastly.com.
We move fast and fix things here at ChangeLog because of Rollbar.
Check them out at Rollbar.com.
And we're hosted on Linode cloud servers.
Head to Linode.com slash ChangeLog.
This episode is brought to you by Linode, our cloud server of choice.
And we're excited to share the recent launch dedicated CPU instances.
If you have build boxes, CI, CD, video encoding,
machine learning, game servers, databases, data mining, or application servers that need to be
full duty, 100% CPU all day, every day, then check out Linode's dedicated CPU instances.
These instances are fully dedicated and shared with no one else so there's no cpu steal or competing for these
resources with other linodes pricing is very competitive and starts out at 30 bucks a month
learn more and get started at lino.com slash changelog again lino.com slash changelog
all right welcome back, everyone.
This is the ChangeLog, a podcast featuring the hackers, the leaders, and the innovators of software development.
I'm Adam Stachowiak, Editor-in-Chief here at ChangeLog.
On today's show, we invited Gerhard LeZou back on the show, a resident ops and infrastructure expert, to talk about the setup we rolled out for 2019.
In late 2016, we relaunched ChangeLog.com as a new Phoenix Elixir application,
and that included a brand new infrastructure and deployment process.
2019's infrastructure update includes Linode, Coral West, Docker, CircleCI,
Rollbar, Fastly, NetData, and more.
And we talked through all the details here on today's show.
This show is also an open invite to you and the rest of the community to join us in Slack and to learn and to contribute to Change.com as you see fit.
So head to Change.com slash community to get started, and I'll see you in Slack.
It is 2019.
It's a new year.
It is.
Obviously.
We're way into this year, as a matter of fact.
Q2.
But in terms of infrastructure, we're kind of new right this has been out for a few months now
gerhard's here jared's here we're talking about the infrastructure for changelog.com
in 2019 it's changed since the initial time we did this show which was 254 back in june 2017
is that right yep that's right that's right. That's right. Almost two years.
Almost two years. So new infrastructure. What's new? What's going on here?
Everything is simpler than it was before. We've learned a lot. We've cut a lot of the fat that
we had in the past. We've learned from our mistakes. We haven't made that many new mistakes,
which is always good, but we have learned a lot. And one big difference is that not only we have embraced all our partners fully with
this new infrastructure, but also we have blogged about it on time and we have shared
a lot more with the audience and with our users than the last time we did this.
That is true.
I like how you added the on time there.
I almost feel like that was self-inflicted, not like Jared Nye is saying it's on time.
There's a subtweet in there.
Yeah.
You did blog about it on time and very well,
a very nice post,
which we'll of course include in the show notes.
Possibly a lot of our listeners have read that post.
I know a lot of our listeners two years ago
enjoyed episode 254,
and I was wondering why why and a few people
told us i think that you know our infrastructure is um a nice example of a real world application
it's not a large application it is a custom application but because we have at least
gerhard and i with regards to the deployment have intimate knowledge of the needs and the decisions and all these things.
It's less Adam and I poking at a black box that somebody else owns
and more us talking about a black box that we participated in creating.
And so there's some real-world takeaways, we hope.
So when we did this two years ago, Gerhard, why don't you just real quickly,
you said lots of change, we've simplified.
And we're going to focus, obviously, on the new infrastructure. But in order to talk
about simplification, can you at least review the old
way we deployed changelog.com before this new shiny?
Yes, of course. So two years ago, we were using
Ansible to do a lot of the orchestration
and a lot of the tasks running on our infrastructure.
We were using Concourse for our CI, which we had to run ourselves. It was not a managed service.
And this year, so there's no more Ansible, there's no more Concourse. Everything that runs
is self-contained within Docker, within Docker Swarm. So we have services,
everything is nicely defined. And the core of the changelog infrastructure is a Docker stack.
And at the core of driving that infrastructure, previously Ansible, basically Gerard has replaced Ansible with a part of my friends, just a badass makefile. If I might just break our own rules here, Adam,
and go explicit for a second.
A makefile that does all these things.
It's kind of amazing.
Yes, the makefile was an interesting twist
to what we had before,
because before we had to run all these commands
and there were a couple of scripts
which you had to remember.
And it felt a bit ad hoc.
Having a single makefile, what it gives us is a
single place to go and to look at all the things that we could do. So for
example, if we just run Make in our repository, the changelog repository, we
see all the targets and Make their call targets. So just to give you a couple of
examples, Make SSH and we can SSH into hosts.
Or, for example, make contrib, which allows someone that's cloned the repository for the first time to make contributions to changelog.com. And that's something which you did not have before.
So there are a couple of very nice things that make life easy of developers, which are also operators.
The barrier of entry is very, very low,
and it's just easy and nice to work with.
Where did you get your make skills?
Because I've opened make files.
I've looked at them and thought, oh, wow.
A lot of them, I think, are auto-generated
from like AutoConf or something.
But mostly it's like, I'm going to compile
the software using make,
and that's what I'd call the core use case.
But obviously obviously there's
lots of things that you can do with make and you're doing a lot of them it almost is like a
scripting tool where'd you get these these wizard make skills where'd you learn this stuff okay so
there is a very good question and i was i was convinced you're going to ask me that actually
uh one of my mates he was saying i will listen to the changelog show that will come out because I want to learn more about Make.
So I was sure that someone will ask me this question.
I'm happy to be your pawn.
Yeah, go ahead. Please answer it.
So I didn't know Make until a couple of years ago.
I mean, Make is this really old build tool.
And when I say really old,
I mean in the best possible way in that it has these decades of experience that went into it,
a lot of sharp edges for sure, but there's a lot of good documentation and a lot of good examples
as well as bad ones. But all put together, it's a mature tool. So I first picked up Make while joining the RabbitMQ team.
The RabbitMQ team had the entire build system is based on Make
and on a tool called Erlang.mk.
Erlang.mk is a tool that's built by Loic Hogwin,
and he's also the creator of Cowboy,
which is the web server that Phoenix uses behind the scenes,
as well as a bunch of other projects.
So Make is still very present in the world of Erlang
and definitely in the world of RabbitMQ,
and that's where I picked up a lot of the Make skills.
So is it a tool that you commonly go to?
Was this out of the norm for
you to reach to it for this particular task? Or is it just one of those second nature kind of
like your hammer nowadays? I think it's a bit of both, to be honest. I used bash for a lot of
things. And I had scripts all over the place. And what I've seen in Make was the potential to make all these different scripts saner.
So I really like the fact how you can have Make targets that depend on other Make targets,
and it has a certain composability that Bash scripts didn't have. It's also simple in that
it comes pre-installed on every system.
I mean, on Mac, you just need the Xcode tools and you have Make.
Okay, it's an old one.
It's version 3.
You want a newer Make to benefit from some of the newer features.
But all in all, it's omnipresent.
It's almost like Bash.
It's everywhere.
You don't need to do anything special to get it.
And I've seen the potential of just improving a little
bit an old approach that I had for many years. And it worked well. So I could see it work on a
big project, which has many dependencies, which has many things that have to happen. And I just
got to appreciate it. Very cool. Well, let's, let's back up a step because maybe we've gotten
ahead of ourself. And I am totally to blame here because i want to hop into the nerdy details of this makefile but
we described a little bit of the deployment infrastructure that was previous there we
talked about simplifying it but we haven't actually described what it is so you mentioned
cowboy is a erlang based web server that that Phoenix rides on, so to speak.
Just to take the Phoenix metaphor, I guess.
Does a Phoenix ride? I don't know.
It's on top of Cowboy.
So what is Change.com? So I can answer that one because I work here.
Change.com is an Elixir-based Phoenix application,
so it runs that technology stack.
And it was previously deployed a couple years
back using what you said there gearheart ansible docker uh concourse ci and we can talk a little
bit about you know each of these moving parts and one of the goals this time around was let's get
rid of some of the moving parts let's let's uh slim it down if we can, make it simpler to run,
simpler to understand.
For me, as not the primary author of the infrastructure,
and generally try to remove as many parts as we can,
rely more on modern technologies
as well as more on Linode,
their load balancers, for example.
But the base of it is,
it's a Phoenix,
it's an Elixir-based application
that uses a proxy, Nginx proxy.
It has a Postgres database.
It has requirements for some local file storage
because that's how we upload our episodes
into the system and process them
with ID3 tags and whatnot using FFmpeg.
And so those are kind of the constraints that we're working inside of,
what we're trying to actually deploy.
And then the way we go about deploying is what we're changing
and what we're trying to improve upon.
So when you describe the new setup,
you start off by saying that Change.com is a simple three-tier web app.
We've talked about the makefile.
You mentioned that we've simplified removing Ansible,
and we're using Docker Swarm.
Can you go a little bit step-by-step through those tiers
and through the way it looks,
and then we can talk about some of the decisions
that we made along the way
and what we were trying to get accomplished.
But first of all, get a lay of the land,
not as it was back in 2016, 2017, but as it is right now.
So changelog, as Jared very well put, is this simple, well, it's more complex, but it's just
like just a web application. It seems simple, right? It seems simple from the outside. Yeah,
it does. It does a lot of things, a lot of heavy lifting. It does a lot of just processing,
which I think would be very complicated
if you were not using Phoenix
and if you're not using the Erlang VM
and we can circle back to that.
But it's just a web application
in a nutshell,
which needs a database.
And it's 3T in that we have a proxy in front. It's a little bit more complicated
than that because the proxy does a couple of things, but it's just your typical web application.
So the core, this core like database, application, proxy, all is described as a Docker stack. So a Docker stack you can
define your services that make up your stack. In this case we have a bunch of
services, there are a couple more that we define, but they form the core of
changelog. The nice thing about this is that because we describe the core of changelog in a Docker stack,
we can spin it up locally on a machine that runs a Docker daemon that is configured to run in swarm mode,
which is very simple, just Docker swarm in it, and your Docker desktop all of a sudden runs in swarm mode
where you can deploy Docker stacks.
That allows anyone on the team to run the core of changelog. When I say the core of changelog,
I mean everything except the IaaS bits.
So anything that for example can only run in Linode, like a load balancer, a node balance in their case, or a block storage volume,
obviously you couldn't get that locally, but you can get everything that makes ChangeLog. And we can set up production, identical copies of our production
locally because we describe everything declaratively in this Docker stack. So this
description of what is ChangeLog before it existed in these Ansible tasks. And it was a bit difficult to
understand. You couldn't go into one place and see it all. It was like a file over here and a
file over there, and this thing had to run, and then that other thing had to run, and they had
to run in sequence. And eventually you would get changelog with all these components, which was a bit more difficult to visualize and understand.
So we have the Docker stack that describes what changelog is at its core.
And then we have all the IaaS components that are around it.
As I mentioned, Node Balancer, it terminates SSL connections, gives us the high uptime.
It's all managed by Linode and it's very nice to work with.
That has a couple of benefits. For example, we can define multiple versions of changelog and we can
just modify the balancer to route to different versions, which is when we do breaking upgrades,
which hasn't happened yet, but I'm sure it will, it's a nice way of making them with minimal downtime
with minimal impact on users.
In front of the load balance, so we always had this,
we had Fastly, which we use for CDN,
which basically caches all the content,
has some nice features, for example,
we get IPv6 for free, we get HTTP2 for free.
So everything that's static benefits from these features,
which are all Fastly features.
And we just leverage them.
It's much better than building yourself, right?
That's for sure.
That's a very complicated problem to tackle yourself.
And you definitely couldn't do it as a one-man operation.
It's too complex, too many variables,
too many things to worry about.
Well, it seems like in this day, you may think that CDNs are pretty well known, but some people
might just reach for block storage, for example, because if it's just assets, maybe like, well,
I want to store this on my own, not think about robotic caching or all these other features like
IPv6, for example, and the logging we have, the streaming logs and stuff like that. So it's interesting that some might just reach for block storage,
but in this case, a CDN really gives us some superpowers
that we didn't have to build ourselves.
Exactly. So the truth is that we still use block storage,
but we use block storage, which is a Linode feature,
which is completely like a pig to the CDN. All the CDN does,
it will read some static files from somewhere, it doesn't matter where they come from, and then it
will cache them. So that whether we use block storage or whether you use like a local volume,
that doesn't really matter as far as the CDN is concerned. Once the file is in CDN, it's in the CDN. The CDN just makes sure that our content
is quickly accessible, MP3s, especially MP3s,
through the entire world.
So they get served from the edge locations
rather than from the Lino data center
where the change log application runs
and where the source and the origin
of all the media and static assets is.
Which is really important for us because we literally have a worldwide audience. where the source and the origin of all the media and static assets is.
Which is really important for us because we literally have a worldwide audience. Like we have listeners in Japan, listeners in South America, listeners in Africa, New
Zealand, where you're at.
Not where you're at in New Zealand, but where you're at in London.
You're in London, right?
Yeah, that's right.
That's right.
You know, all over the US, North America.
So, I mean, we literally have a global audience. And so we had to have a CDN to
truly respect. I think an average MP3 is around 100 megs, roughly. But still, yeah, that's a
decent amount to pull down. And you wouldn't want to pull it from Dallas, Texas or, you know,
New York or New Jersey or something like that. You'd want to pull it from your local pot
so you can get it faster. Definitely. That's definitely a good thing to have.
And in our case,
the cherry on top is how we can just
get IPv6 and HTTP2
with no effort on our part, simply
because our CDN provider does that.
So that's great.
Thank you, Fastly, for that.
Quick
question about the block storage.
And this is like, does linode support this
feature kind of question so maybe not asking the right folks but maybe gearhard you know um could
you set up a block storage in like a cdn mode where like they'll basically distribute that
storage geographically and it could be like like a cdn block storage because that would
that would be cool um i don't know whether Linode has that feature, to be honest.
I'm more familiar with Google's cloud.
I work a lot with Google's cloud,
and I know that their storage volumes,
they have different modes in which they run,
but by default they have multi-region.
So basically every write goes through to three separate, actually that's multi-region so basically every write goes through to three separate
actually that's multi-region sorry that's that's Google Cloud Storage which
I'm thinking which is similar to s3 I'm sorry I was thinking about the volumes
the gosh I forget what they're called now they're the persistent disks the
persistent disks which you get they're actually replicated all rights are in
three separate zones so it's one region, but three zones. There are the SSD drives that you tend to get, which is similar to the
Linode's block storage. As far as I know, Linode doesn't have something similar to S3, which is
more like object storage, which is what we would want to make sure that we are consuming an API
when it comes to our storage. We're not consuming devices, which is what a block storage is. It just presents a volume,
you mount it, but it's not like on the physical machine. Right. And when you say physical machine,
you mean virtual machine? In virtual machine, of course, yes. It's not on the virtual machine,
yes. Physical virtual machine, yes. The physical virtual machine, yes.
By default, though, S3 is not a CDN, though, right?
You still have to layer on some CDN on top of even S3 if you were going to use S3 for object storage, for example.
That is correct, yes.
Right, CloudFront.
But the difference between, for example, block storage and S3
is that S3, you consume it by an API,
and you work with objects.
You don't work with files.
I mean, you can, but it's performance is not good.
If you do that, they're like different modes in which you can use that, but they're like,
I wouldn't say hacks, not classes either, but it's not how it's meant to be used.
They're meant to put a file, get the file files you have operations like that yeah and maybe we can let's stop for a moment and i will explain
a little bit because people might be wondering why are you not using an object storage and um
and gearhard you asked me this as well like why do we need local storage because in the age of 2019
or i guess when we started this in 2016 even then i had been using heroku
for many years and they will not allow local storage of files right it has to be transient
it has to be ephemeral and you store things in memcast or you store them uh in s3 or you know
services and the reason is very lame and it's very simple is that it's easier for me to develop software against with a concept of
local files it's easier for uploads it's easier for metadata it's easier for uh ffmpeg we just
shell out to ffmpeg to do our id3 tags a feature that we need and one that i at the time and
probably still to today would have slowed me down dramatically
to have to write an elixir based ID three tag thing now ID three v one is out there but there's
no ID three v two library which maybe eventually will write anyways if you're out there and
interested holler at us if you'd like to write something like that but I wanted to shout out to
FFmpeg and pass it a file and let it do its thing that's a very simple and straightforward way for me to
get things done and so that's just kind of like the the end the end user demanded at gerhard and
the end user was me i said no we're just going to go ahead and use local file storage like
like old curmudgeons and that's that decision has uh had its pros and cons through the years but here
we are with block storage i have a little twist to that way i like to to what you said and i'd
like to offer like an alternative approach sure it's very important for anyone to take stock of how they work and what works for them. The worst mistakes which
I've seen in my career is teams that choose something for what it could be or what they
think it could be and not for what it is. So when you know what works for you and you know what
you're comfortable with, rather than picking a tool for its merits, you need to think,
well, how is this going to work for me?
Which is a very difficult question to answer.
So rather than going to, let's say, Kubernetes, right?
Because everyone is using Kubernetes these days.
Rather than saying, we're going to use Kubernetes and that's the end of it,
I was saying, we're going to use Kubernetes and that's the end of it,
I was saying, well, hang on. What's the simplest thing that would improve what we have that would be better without, for example, picking a tool before we know whether it fits us?
So you mentioned log storage. It's easy. It's comfortable. It might not be great,
but it works for us. And I'm sure
like the day will come when we will replace that with something else. But it was too big of a step.
We had to make smaller steps and for example replacing the CI and changing the way we define
changelog, like the core of changelog, that was an easier step to make, which made us
just made everything simpler and nicer to work with. And one day, I'm sure the time to replace
block storage and the time to use Kubernetes will come, but it was not in this situation.
This episode is brought to you by GoCD.
With native integrations for Kubernetes and a Helm chart to quickly get started,
GoCD is an easy choice for cloud-native teams.
With GoCD running on Kubernetes, you define your build workflow
and let GoCD provision and scale build infrastructure on the fly for you.
GoCD installs as a Kubernetes native application,
which allows for ease of operations,
easily upgrade and maintain GoCD using Helm,
scale your build infrastructure elastically
with a new Elastic Agent that uses Kubernetes conventions
to dynamically scale GoCD agents.
GoCD also has first-class integration with Docker registries,
easily compose, track, and visualize deployments on
kubernetes learn more and get started at go cd.org slash kubernetes again go cd.org slash kubernetes So simplification was one of our main goals.
Another goal that we had was to allow a better integration into GitHub
full request flow for contributors with running the tests
and integrating with a CI.
Concourse CI does a lot of things really well.
And Gerhard, you actually introduced it to me back then.
And I enjoyed a lot of its benefits
one of the things that doesn't do well was what I think is a rather simple feature uh just because
of the infrastructure the the architecture of concourse ci which is really a power tool for
building your own pipelines is that this doesn't really integrate very well into that, the needs of GitHub for those pull request builds.
And with that and other reasons,
we've replaced Concord CI with CircleCI, this version around.
Gerhard, walk us through that.
CircleCI, what it's doing, how we got here, etc.
So if I remember this correctly,
your exact words
were Concord is a black box. It's nice, but I don't understand it. Can we use something else?
That was pretty much it. Something like that. Yeah. Yeah. I mean, you didn't like the fact that
we had to manage it ourselves, which was definitely a big drawback.
It was a black box for sure.
There was an integration between Ansible and Concourse to where as a person that was just trying to use it,
I couldn't understand, like, is this an Ansible thing?
Is this a Concourse thing?
And do I change it here or there?
And what do I run locally?
What do I run in concourse etc
and i never actually grasped the entire mental model which is why one of the the major goals
to simplify was to was to you know fit this inside my monkey brain concourse i i i completely agree
with you concourse is a very complicated tool um and it's you can do some great things with it.
For example, the entire RabbitMQ pipeline,
and there are many pipelines, are run by concourse.
And it's a beast, right?
It's a beast.
And that's overkill for what changelog needed.
Circle CI ticked a lot of boxes.
And I think the most important box,
which it ticked, at least from my perspective,
was that
they are our partners so if changelog is recommending that you know hey check them
out check this service out it's it's a cool service how would we know if we're not using them
yeah I mean on paper they are great but we're not using them we're not dogfooding them so we don't
know how great they actually are and when they have some scaling issues, when they have just growing pains,
we're not aware of that. So it's very difficult to empathize with the people that we recommend
to use this tool. And from my perspective, that was the biggest reason to switch to CircleCI.
Manage less infrastructure, use and embrace our partners,
and just see what the experience is like. Not only that, but because we are using CircleCI,
our users can see how we use it and they can maybe try and mimic what we do as a starting
point and then make it their own, obviously. CircleCI is a lot lot simpler in terms of what it does for us. I'm sure you can do some amazing
things with CircleCI but we don't. The reason why we don't is because
switching from concourse, well the reality was I had to do a lot of things
when it came to the switch and I was thinking okay what's the simplest thing
to use CircleCI to do all the things that we need,
but not the more complex bits? And are these things better suited, for example, for Docker?
Can we define this as a loop? For example, the application updater.
We have this service that runs in Docker that updates our application.
So this has a couple of benefits. First of all, it decouples CI from production. So CI is not aware about production. All CI does is run tests, obviously run the
builds, resolve all the dependencies, do the packaging, do the assets, and then it
pushes the resulting Docker image to Docker Hub. And that's it. That's where
the CI stops. It doesn't have access to any secrets, to production secrets,
which is great from a security perspective.
And if, let's say, one day you wanted to try another CI system,
it would be fairly easy because it does only a subset of the things that we needed to.
And it's not tightly coupled with everything else.
So moving from concourse to CircleCI was somewhat painful
because we were putting more in CI than could have, should have been.
As you mentioned, moving some things to Docker,
this application update loop, so on and so forth.
Is there such thing as like, obviously it seems that way,
but like CI lock-in where there's so many interesting tools in a platform
and or
open source ci for example even you know that you can kind of get locked into those particular platforms definitely i would definitely agree with that and we were somewhat locked into concourse
because we had concourse do things that other cis cannot do as easily. I mean, sure, it's possible, right? Anything is possible, but at what cost?
And in our case, we almost had to take a baseline,
and we had to use a CI for the things that we wanted,
like Jared mentioned, PRs.
Concourse wasn't good at PRs for us.
It can do them, but they're not as first class
as some of the other things.
So PRs in CircleCI are very easy.
They're so easy.
And that was a nice feature, which we embraced.
But for example, orchestrating our infrastructure,
I would not have put that in any CI, to be honest.
Concord is a bit more than a CI.
As Jared was alluding to to it's this automation platform almost
it does so many things
and some it does them really well
but it's a very complex system
it's not a CI
it's a lot more than a CI
so CircleCI would just use it
run our tests as I said
and there's a pull request
run our tests
resolve all the dependencies
package assets and then produce a build artifact.
And that's it.
Walk me through a lifecycle then of what it means to be continuously built and monitored.
So as we do local changes, we're fleshing out a new feature, we ship it to GitHub.
Walk us through the lifecycle of what it means to be continuously deployed and then also monitored. So let's take the most common path, which is the one
where someone pushes a change into master branch. So we're not creating any branches,
we're pushing a small change to the master branch. When the master branch updates, CircleCI
receives the update via webhook and it runs the
pipeline. By the way, we can also define pipelines in CircleCI, which is very nice.
The pipeline, all it does is the first thing, it resolves the dependencies,
makes sure that we have everything we need. Then it runs the tests. It compiles
assets, any assets that we need. This is the CSS, JavaScript, images,
and then it publishes an image to Docker Hub. We have this application update, which I mentioned,
runs within our production system and it continuously checks to see if there is a new image.
If there is a new image that was published to Docker Hub, it pulls that image
and it will spin up a new instance of changelog.
If this new instance passes all the checks, there are a couple of health checks which we define,
then it gets promoted. So we have blue-green deploy. It gets promoted to master and to the live application.
And the old application just gets spun down. If there are any issues with a new application,
for example, it doesn't pass its health checks, the deploy fails.
And we get a notification by using Rollbar in our Slack. So whenever there's a deploy, whether it's a good one or a bad one, we always get notification
in our slack viral bar.
So you can see what gets deployed, who deploys it and when it was deployed, which by the
way, you can participate in all this via change.com slash community free sign up, grab a slack
invite free slack invite up into pound dev, free pound dev, all things free here.
And you can see those roll bars flying in throughout the week.
And maybe you'll get annoyed of them and maybe you'll leave the room.
But if not, you can hang out there and see the stuff that's being deployed.
Just a quick pitch to come hang out with us in our community Slack.
I'll add a little more to that.
I think Garrett mentioned this earlier too,
the invitation of being able to use some of these partners,
but also being able to see CI.
And you mentioned, Jared, too, about a real-world app
and being able to see how it's done in open source.
It's not a very sophisticated app.
It's pretty easy to jump in, but the invitation is there, really.
I think that's what you're really saying here is that
if you're out there listening to this and you're like,
how in the world do they do this?
And you're listening to this show, follow Jared's instructions.
That's how you get in.
You're invited. Everyone's welcome. Come check it out, follow Jared's instructions. That's how you get in.
You're invited.
Everyone's welcome.
Come check it out.
See how it works.
Ask questions.
We love that.
That's actually something which I picked up from you when it came to ChangeLog.
Not everybody that runs a business
is as open as you are.
So not only do you develop in the open
of the entire application,
but we share everything we learn, right?
All the fixes.
And I know that Nick, he's Nick Janitakis.
He's one of our Phoenix Elixir fans and he's in the Dev Channel all the time.
And he keeps mentioning us on, what was it?
Not Stack Overflow.
Hacker News.
Hacker News.
That's the one.
Thank you.
Thank you, Jared.
And so he's very excited to see that he can pick up a lot of the things that we learned.
And he can also get to share when he learns something.
Because it's open source, he can do PR.
And all of a sudden, our application is better.
And everybody benefits.
And that's the one thing which I loved about ChangeLog, how open it was.
Jared, you were the first one that open sourced the entire new changelog.com application. I thought that was great. And then later on,
I did the same for infrastructure, but because it was done so late, it made little sense to the
users. I mean, they've seen this finished thing. While with this new infrastructure, the entire
code base, everything we do is in the same repository as the application.
We use a monorepo, right? It's one happy party. Right. It's one happy party and everyone's
invited. That's right. That's right. Come have fun. Come have fun. That's exactly right. And
that's the one thing which I always appreciated, even love, right? The way we did things at
ChangeLog. I really, really like that. You can even see what
things we have in flight. For example, Raul Tambre, he wanted IPv6 support. And the reason why we
did that is because he said, hey, I would like IPv6 support. And this is where we could start.
And he even did PRs. And that was one of the nicest features to work with the community. It was really, really nice.
And actually, now that I think of it,
PR246, where we made it easier for everybody to contribute to changelog.com,
that's what started this new look at how we use Docker
and how can we improve our use of Docker
so that anyone can benefit from this simple approach.
Yes, and the advantage of having an identical setup in development and production, which
I've never been the beneficiary of.
I always have slight differences.
And Gerhard, you and I had slight differences throughout this process as you were on Linux
and I was on Mac and we got to iron out those kinks.
But just having the exact same circumstance and even even today i
still run the local version and then i'll kick over to the docker the local docker version because
i still have my setup my old school setup um and realize you know have an actual clone so to speak
of what production is and it's uh it's much better that way you You're converting me. You're converting me into a Docker fan.
I'm very glad to hear that.
And a little secret is that actually there are three layers.
So the first layer is your local dev setup
where everything runs on a Mac or Linux
and you need to install on your machine Erlang
and Elixir and everything else.
Then there's the contributor setup, which is the one that we did in PR246. And that's where we're using Docker
Compose. You just make Contrib or Docker Compose up, and then Docker Compose orchestrates all the
components. And there's a third approach, which is actually closest to production,
where there's a local stack.
The reason why we use a local stack is because some things are different, like certificates,
for example.
We don't have SSL locally.
We could, but it's not something that we run locally.
Our SSL, by the way, is terminated by the load balancer.
So that's something that even if we did have locally, it would not reflect production.
Right.
So they're like these three tiers.
I think you're between the first one and the second one.
I don't think you're that used to running a local copy of production locally yet,
but I think we're getting there.
While we're talking about Docker,
let's loop back to something that you said earlier,
which has to do with running in a loop.
I think one of the neatest little hacks, so to speak,
little tricks that you pulled off is this self-updating Docker container.
So basically, as you pointed out, once CircleCI has a build,
it publishes it to our Docker hub.
And that's the entire application, right?
And then on the production host, there is a Docker container
whose entire purpose is to update the other Docker containers.
So...
Am I describing it correctly, or just the app container, basically?
Just the app container, that's right.
It's called the application updater,
and its only purpose is to update the application.
But it does that...
Is this custom code? Is this a solution you're using, or the application. Is this custom code?
Is this a solution you're using?
Is this standard practice?
I just don't even know.
I don't know what it is, to be honest.
I like his giggles before he answers on certain things.
It's like he's revealing this thing, but it's not, obviously.
He's waiting for us to ask that.
These are actually all good questions, which I very much enjoy.
I have to say that.
So the gist of this application updater is literally a while loop and Docker service
update.
So it's a Docker service updates running in the while loop, which is like three lines
of code.
That's it.
That's how simple it is.
And the reason why it can be so simple-
Where does that code exist? That code exist in a Docker file or as part of the Docker stack? That's it. That's how simple it is. And the reason why it can be so simple...
Where does that code exist?
Does that code exist in a Docker file or as part of the Docker stack?
Where does that actual code exist?
So there is a Docker file that builds the container,
the container image, which runs this code.
And the image gets deployed with the entire stack,
part of the Docker stack, onto production.
So you get the application, you get the database, you get the proxy, and you get the application updater.
Right. So is the code for the application updater, is this Docker service update running in a loop?
Is that code in the Docker file that then gets built as the image?
Or does that code
actually exist in our code base? The code, the codes as with everything else exists in changelog.com.
Right. It is a script. It is a script in changelog.com, which gets built into a Docker image.
Okay. So it's a Docker, like for example, we have a Docker image for the application,
which defines what the application is. We have a Docker image for the application, which defines what the application is.
We have a Docker image for the proxy
where all the legacy assets are stored.
So we have a Docker image
that has this application update to code,
which is a subset of what we have in the application,
in the changelog repository.
And it's literally just like a script,
a simple script, which has a while loop.
Okay, so which file in the repository?
I can answer this one. Are we going to find this code Gerhard? Which file is it in?
So in changelog.com, there's a Docker directory. Okay. Docker directory is the update service
continuously file. That's the question I was asking. There you go.
Sorry, some questions are hard for me.
No, no, no. Especially the simple ones.
I'm mostly razzing you because it's just funny to ask you to continually drill down on the
specifics, but also because I wanted to look at while you're talking.
So I just continued to.
There it is.
Back.
What's it called?
Something continue.
Update service continuously.
Boom.
Okay. I'm with you.
I got the file open.
He's also linking this up in the blog post too,
so that's why I'm tracking a little bit closer
because when he talks in the blog post about the Docker service,
managing the lifecycle of the app,
he points to the running loop via a link,
which links to the code that you're talking about, Jared.
That's exactly right.
I'm staring at that word running in the loop,
and I just never clicked on the link. Yeah, I did because I was curious, and I was like, what does this mean? that's exactly right. I'm staring at that word running in the loop and I just never clicked on the
link.
Yeah.
I did cause I was curious and I was like,
what does this mean?
That's all right.
I think I got my money's worth with getting your heart to answer.
It's better.
Okay.
And so it,
is it basically pinging the Docker hub or is it just saying this Docker
service update?
I guess right there,
do Docker service update dash dash quiet.
And dash dash image.
Dash dash image specifies an image. If the SHA of the image changes,
it will pull the new image and update the service.
So you know how in Docker you have images
and the images have tags?
So we always track latest.
But latest is just a tag that points to a SHA, which is similar to
a Git SHA that points to a unique commit. So the latest tag, when it points to a new SHA,
then the Docker team knows, hang on, there's actually a new image, even though the tag
hasn't changed, but the tag is pointing to a new image. It's almost like master, the master branch, which changes as you have new commits. The latest tag in Docker is exactly the
same. It points to a place which changes over time. And when the CI updates the latest tag for
the change log image, this loop and this Docker service update knows it has to
pull the new image down and update the service which uses the image. So during that time period,
which is pulling down the new image and it's starting the new app container, do we have two
app containers running simultaneously, the old one and then the new one? That's right. So who answers the phone when somebody calls?
Internally in Docker, there is an IP, which is almost like a gateway,
and any request coming in goes to the live app.
And the live app is the one that's healthy and has passed all the health checks.
The new app, as it starts up, it's not ready
to serve requests. So it needs to come up and the health check checks whether
port 4000, whether the response is a 200 response. When the new container, the new
app container, when that health check passes, the service knows to
update the internal routing to point to the new app instance. It's all automatic. It's all managed
by Docker. We don't have to do anything. We used to have lots of scripting that used to make this
switch for us, which is still in the old infrastructure repository where all this was
kept. It was complicated, it was custom.
We're not using that anymore.
We're just delegating to the Docker service update,
which manages all this lifecycle for us.
Which is smart because we're getting it for free.
We've had to write that ourselves before in Ansible concourse land.
Exactly, exactly.
And now we don't have to worry about that.
It's all managed for us.
And we will do something similar if we use another system. It's a property that's very desirable.
This blue-green deploy never take production down until the new version is ready is very desirable.
And we had it for a long time now, even though we did it ourselves, which was difficult to
understand, difficult to maintain. And because it didn't change, things didn't go wrong.
But if things started going wrong,
it would have been this code that's written a couple of years ago,
and it is what it is.
So now if it doesn't go right, instead of calling Gerhard,
we call Solomon Hikes, or I guess he's moved on.
We call Docker and say, what the heck, this health check isn't working,
or this Dockercker service update
is failing is the is that the proper scale of the escalation now when we have issues is blame docker
i think to a degree yes i mean that's that's the same thing like for example when something
doesn't work with linode what do we do like we need to call in out or something doesn't work with fastly. Right. No,
we call fastly.
And so that's,
I suppose the trade off of having someone else do something for you.
But I think it's a price worth paying, knowing that you don't have to deal with any of the complexities that go into
one thing.
And even though it might look very simple,
I mean,
if you look at some of our other scripts that we used,
I mean,
it's not that simple and a lot of things can go wrong. So it's a, it's a nice thing to delegate. Yeah, absolutely. Anytime we can pass
the buck, let's, let's pass it right on. One last question while we're down here in the mucky muck
of this updater is what's the circum, how does the circumstance work in which a comp,
a slightly more complicated push, which also has some database migrations in it. So
how does the system do the update and some sort of manipulation of the database, which is on the
block storage? Right. When the new application instance starts, it will run the database migration. And this is not optional, it always does it.
If the database migration makes the database incompatible with the running application,
the live application, the live application, it won't crash, but parts of it won't work.
I mean, they may stop working. But this being Erlang and being Elixir,
it's just like basically some processes will start crashing inside the Erlang VM.
When it comes to, for example, I think this is mostly in the admin area
because most of the website is static.
And once like what the users see, once we generate that static content,
it doesn't go to the database.
Right.
So that is, this is definitely one way of, let's see, screwing production, right?
So if you have a bad migration or something that does something, breaking change to the database, it would take your production down.
But let me ask you this question.
Okay.
What would the alternative be?
The alternative would be if I have a bad migration,
it would never promote that app container,
except that we would have to have a separate database instance or something right
so yep because you've already migrated the database so the app container doesn't really
matter because the database is in an unknown state um the i guess the alternative would be
roll auto rollback uh i know things get things get complicated quickly. I know that. And they get very complicated. And especially
with a system like, exactly why we don't do that. Yes, exactly. It's something to be aware of. It's
something that, you know, if it happens, it's bad luck, but you always need to be mindful of this
thing. And the alternatives are very costly, both time wisewise, both effort-wise, and do you need that complexity?
So as I was mentioning earlier, I work on the RabbitMQ team where distributed stateful systems is the bread and the butter of what we do.
Any sort of rolling migrations are extremely, extremely complicated.
And that's why, like, how do you, for do you upgrade a RabbitMQ cluster? Most of the time, rolling
upgrades work. But when we introduce breaking changes at a protocol level or at a database
level, at a schema level, we recommend to deploy something on the side, like Blue-Green Deploys.
And if we do that with something like PostgreSQL, imagine setting up a database copy,
what happens with the writes that arrive to the data, which is like when database is running in
production. How do you migrate them? How do you basically move them to this new database instance?
PostgreSQL is a single instance, not a cluster, which complicates things even further. So it's a complicated problem, which I don't think we need to solve, to be honest.
This episode is brought to you by our friends at Rollbar.
Move fast and fix things like we do here at Changelog.
Check them out at rollbar.com slash changelog.
Resolve your errors in minutes and deploy with confidence.
Catch your errors in your software before your users do.
And if you're not using Rollbar yet or you haven't tried it yet,
they want to give you $100 to donate to open source via Open Collective.
And all you got to do is go to rollbar.com slash changelog, sign up,
integrate Rollbar into your app.
And once you do that,
they'll give you $100 to donate to open source.
Once again, rollbar.com slash changelog.
We didn't go deep enough into monitoring in the last segment, so let's do that now.
So we have Rollbar, we have Pingdom, and this new thing I didn't even know existed until literally minutes ago.
So if you look at netdata.changelog.com, this is visibility into our CPU, our RAM, our load, all sorts of interesting stuff.
So what is NetData and kind of tail off the monitoring piece of how we run changelog?
So NetData is definitely a component we didn't mention. It gives us visibility into system metrics. So what happens on the host, on the VM, on the Linux VM that runs the changelog
application and the database and all the other components that make up changelog? I would
have to mention logs as well. So logs and metrics, they kind of go hand in hand. When
it comes to, actually there's one more component, which would definitely be the exceptions,
the application exceptions, which I already mentioned Rollbar. By the way, we mentioned Rollbar
to track errors as well,
application errors, application exceptions,
and also to track deploys.
So they kind of go together.
Because if there's code in a deploy that's bad,
you want to track it back to an error, etc.
Exactly.
And you want to see how often those errors happen,
when they started happening,
when which deploy it was introduced,
and so on and so forth. And Rollbar is really good at giving you that visibility.
When it comes to logs and metrics, I mean, we mentioned this even two years ago, we aggregate
all the logs from the entire infrastructure and we ship them to Papertrail. Papertrail now is
together with Pingdom, they're part of SolarWinds, SolarWinds Cloud, they're like this nice observability stack. So the logs
we ship them to Papertrail via LogSpout and metrics we delegate it to NetData
which is this amazing open source product, free, right, we love free.
NetData is completely free, completely amazing in that it gives you per second metrics. There are very few
monitoring systems, metric systems that give you that level of visibility. And not only
we see the CPU, the network, we can see for example TCP sockets. And when we first introduced IPv6, the one thing which we noticed, this was on the
old stack, by the way, we had a TCP socket leak. And it's something which NetData made very easy
to see. So if you go into the pull request, which is, again, public, where we discuss, sorry, it's
the issue that we'll discuss this IPv6 support. When we first introduced this, there was a leak,
Raoul that requested the feature,
he could see it and we could mention it and we could discuss around metrics. So we see
very detailed system metrics and we can also see per container metrics. So we can see,
for example, the application container, how much CPU it's using, how much memory it's using. And it's all real time. It's all per second. And that means that we have real-time visibility,
but for a limited duration of time. So currently, we only display metrics for the last hour,
and that's it. And the reason why we do this is because the metrics are stored in memory.
And even though we could give it more memory, we limit it to this one hour worth of metrics because we're low on memory well no we're
not low on memory we could definitely i know i know that's why i say it that way so uh we're
definitely not low on memory we have lots and lots of memory bucket loads of memory, but the more we store in net data, we could do with storing in another
system which was built for historical metrics, which is Prometheus.
So the Prometheus and Grafana, they also form part of an observability stack, which I'm
very excited about, and that's something which I'm hoping that we'll be able to do in the
future, which would
give us a long-term visibility. We'll see how things change over time in the entire changelog
stack. So this net data, is it just running in another container on the host? Yes. And so if we
eventually said, okay, it's time for Prometheus and Grafana, would you just set those up as other
containers on the host? That's correct, yes.
I'm learning things.
That makes sense.
So, okay, long-term metrics coming out of Prometheus is a nice to have down the road.
In the blog post, you also mentioned business metrics.
I'm not familiar with these tools.
I know we did a show on Prometheus probably three years ago,
but that doesn't mean I remember any of it. And I'm here to tell you I don't. So give us the give us what are you talking about business metrics? How could we use this beyond just net data, but longer than a day? was downloaded and when it was downloaded. This is something that we could track like downloads,
which could be like a rate of episodes. We could track them over time and we can aggregate
all downloads across all episodes. That's obviously just one type of metric that we could
have. We could also track when do users stop listening, for example, to mp3 files,
like how much of them they download. And we could store all these metrics alongside everything else
in a system like Prometheus and then we'd use Grafana to visualize those metrics. So
literally anything that you want to track long term, we could store it in Prometheus, which is like a metric storage system.
And we could visualize it using Grafana, which is a metrics visualization system.
So these are metrics that we care about, obviously.
And we are currently doing that work, but we're doing it in application.
So is it easier in Prometheus than the way that I'm doing it with my Elixir codes?
I think it would be a matter of delegating that responsibility to something that was
built for metrics.
Prometheus, for example, it's suited for metrics like high frequency metrics, lots and lots
of metrics that continuously change.
We could also use something like InfluxDB, for example,
which is another system also for storing metrics.
It has a slightly different target audience,
and that might be better suited for business metrics,
and that has maybe queries which are like a SQL query.
You can run SQL queries, which I think would be better suited
for the business metrics.
But I'm pretty sure that we can make Prometheus work for us for both types of metrics rather than having these separate metric system running.
And I think that in FluxDB, I think only the core is free.
I think there's like a paid version.
I'm not sure on that because I've only used it a long time ago before it went all commercial. This was pre-version one era. I do use Prometheus every day. Actually,
all the RabbitMQ metrics, there's a new feature coming out which will be using Prometheus heavily
and Grafana heavily, and it's excellent for those types of metrics, system metrics.
The Phoenix application being an Erlang application,
there's a lot of stuff that we could use for changelog itself,
which maybe we all need that level of detail,
but it's nice to know that we could do it
if you wanted to.
And it's already been done for us.
We don't have to reinvent the wheel.
We can just reuse something in this context.
So for now, I'm just dumping everything right into postgres and basically using good old sql to slice and dice
it into things that we that we want to see that being said it's very manual i mean if we want to
have a new view in fact i just added a view today today yeah today of the a graph of the uh of all episodes first seven days reach
so the basically the launch data reach for episodes on a graph and it's like that's us
that's a feature that i have to develop i would love to have a tool, maybe similar to Metabase, Adam, where we're dumping the information into some sort of raw storage
and then it's sliceable and diceable in ways that are more ad hoc
or more like a reporting tool.
Is that Prometheus or is that Grafana or is that neither of those?
So Grafana would be able to visualize metrics
and it has this concept of uh
back end sorry data data sources so you could for example use grafana with influx dv
with prometheus it even has even supports like stackdriver which is like a google product
so it supports these different like storage metric. One of them is Prometheus.
I would need to take a closer look at all your metrics.
I'm very familiar with Prometheus,
and I would know what it can and it can't do that well.
Most metrics, like I haven't come across a metric
that Prometheus can't store or you can't use it for.
Maybe, for example, InfluxDB would be more efficient. Now, do we need that? I don't know, maybe. I would definitely
need to take a closer look at the metrics. But what I do know for sure
is that if you are writing code that manages
metrics, you would be better served using a system
that was built for that and maybe writing code that is specific
to your business needs. So in your case, like, for that, and maybe writing code that is specific to your business needs.
So in your case, like for example, those ID3 tags and those FFmpeg, I would be so happy
if we could maybe switch to object storage and not use block storage for that type of
media and for that type of static content, rather than maybe spend the time doing metrics related
work.
It's a bummer
to have you build out features that could
potentially serve more
people than just us.
But also theoretically commit free
because if
as a user I have questions
and I want to pull the data
source without having to know SQL or have access to the server
and I can do it just in the data source itself,
it gives us more flexibility.
And plus, as Gary just mentioned,
frees you up to do more high-value things.
Right.
Exactly.
One thing which I would like to mention,
and this is very relevant, actually two things.
Grafana 6 came out, and it has this new amazing feature. When I say came out, it came out in
February. So it's been, I suppose, two months out. It has this new feature where it allows you to
explore metrics. So you're saying about having to write these queries or having to write these codes
to see metrics in a different way. Well, Grafana has this feature which allows
you to explore metrics and play with metrics and just to
see what data do you have, what metrics do you have, and which way you can combine
them. So that's the one thing which is very cool.
Obviously, you can build dashboards, and dashboards are more static where we can
give a couple of examples, we can
link them in the show notes. But the other feature which I'm very excited about is Loki. Loki is this
new Grafana. It's part of the same stack, Grafana Labs, and it's for log aggregation.
So all of a sudden we can ship our logs to Loki, which manages them, and it shows
them in the context of the metrics.
So when we see some, for example, maybe our database is running slow, or our application
is running slow, or it's crashing, or whatever may happen, we not only can see the metrics
that correspond to those misbehaviors, we can also see the logs, which will give us more insight.
So this combination is great to have.
And not to mention, now that you have the business metrics in the same system,
you can overlay the business metrics alongside your infrastructure metrics,
your application metrics, and your application logs.
So you can see the impact that the database being down,
for example, if it was to happen,
what impact does that have on the audience
or in shows or whatever it may be?
And I don't mean just like short-term,
I mean long-term impact.
Yeah.
Maybe ignorance is bliss though,
because once I find out, you know,
what that line of code,
what that bug is actually costing us in terms of listeners, might want to find a new career oh boy just kidding that that
does sound pretty cool so you there's some other things that now i'm wanting you to to help us
actually have a good first maybe test case for from medias we can talk about that maybe
offline which is a simple metric that's a business metric that I'm not tracking yet and I want it to track,
but I do not want to add it to our current Postgres setup.
So maybe that'd be a good one for Prometheus.
What else?
As we look to the future of change.com,
we've made big strides, we've simplified,
we've switched things out,
we've decoupled a little bit from certain aspects of our stack.
There's a lot that we didn't do. And one thing that always
comes up and what Adam asks is why not Kubernetes? We asked this last time around. Let's go ahead
and ask it again. Why are we not using Kubernetes? Okay, go ahead. Why not Kubernetes?
Why Kubernetes? Why not? Why not? We could be here all day, right? All night.
Okay. So the simple answer to that is that Kubernetes two years ago was hard. Kubernetes only recently is Linode, for example, doesn't offer managed Kubernetes yet, but it's almost
there. They're very, very close to having managed Kubernetes. What does that mean for us is that we get to use it, we don't
get to worry about upgrades, about when things fail, and so on and so forth. Digital
Ocean, for example, it already has a Kubernetes offering, managed Kubernetes
offering, and that's great. So, you know, that's something maybe worth considering. But what we definitely do not want to do is worry about our Kubernetes deployment. We just want to
use it. Two years ago, we would have to go to Google or some, you know, big vendor to get that.
Nowadays, DigitalOcean, Linode very shortly, and others have it, which is great for us.
So managing Kubernetes yourself is a very difficult thing and requires dedicated resources.
Is that why we said no to that?
Oh, yes.
Yeah.
Oh, yes.
And the learning curve is very steep and things are done in a certain way.
It's just another layer of abstraction.
It's almost like we were using concourse, which is way too complex for
what we needed, while CircleCI was good enough. As an analogy, we're using Docker, for example,
and Docker Swarm, which is good enough for what we need. Kubernetes would be nicer.
And because we have these managed Kubernetes offerings and managed Kubernetes services,
it's something that we could definitely benefit.
So if you're going to use Kubernetes, though, you want to go in a managed scenario rather
than trying to run it yourself in most cases, unless you're extremely rich in terms of the
business and endless resources.
Yes.
And you need something even more specific, custom, for example, you need to build custom
Kubernetes resources, which is like a world of its own, very complicated.
But if that's what you need, that's what you need.
In our case, we don't need that.
Now, I have a secret to reveal.
I've kept some of the best parts for last.
Oh, please, share them.
All right, so with Linode,
before, two years ago, we were using Ubuntu.
And because we were using Ubuntu at the time, we had to manage Docker ourselves. And that was a pain. In this infrastructure,
we're using CoreOS, which is container OS. And what that means is that that comes with Docker
pre-installed. It has this nice feature of auto updates, which we don't use. We don't need to go
into the details. By the way, all these are in our repository. We have all the reasons.
We have discussions with the Linode team why we didn't do it and other
alternatives. But the point is we don't manage Docker. Docker is a managed thing
for us and that's very nice. Any updates, anything like that, we don't have to
worry about. We don't even have to worry about installing it. On Ubuntu, you need the first thing. If you want to use
Docker, you have to install it. On CoreOS, we don't have to worry about that.
So because we no longer need to manage Docker, all the things that we used to do
in the old infrastructure, we no longer have to do. So a lot of that code is no
longer relevant in the new world, which is really nice.
Because we are running Docker in swarm mode, we have a single instance of Docker and we should
have more for sure. And to do that we need to change a couple of things. For example,
right now when we provision the block storage in Linode, we do that using Terraform. By the way,
use Terraform to manage everything. I didn't mention that earlier on, but it's a nice little
thing to have and it's very simple as well. We love that. So rather than having Terraform manage
these block storage volumes, we would need to use a plugin for Docker, which by the way, Linode definitely has and wrote for us, sorry, not for us, wrote for their users. And that was
very nice to see, which would allow us to, for example, use Docker swarm. We have multiple nodes
and as applications or as containers move from VM to VM, the volumes would move with it, which is very nice. And all this really is the
core that sits in the managed Kubernetes wrapper. Because Kubernetes, there are all these components
which give you higher level abstractions to something that runs containers. And in this case,
it's Docker.
So you need Docker and, okay,
you can replace it with something else,
but you need something that runs those containers.
And then you can use Kubernetes that gives you a high-level API
that allows you to define things
in a way that we do, but a bit more complex,
where you can define the entire stack
and what it means for all these containers to communicate
in all the networks and all the services, as I was mentioning earlier.
Which project is that from, Linode?
Is that KubeLinode or is that Linode Cloud Controller Manager?
I'm not sure. Let me see.
So on Linode, there is actually developers.linode.com
forward slash Kubernetes, and they have thei the linode cli which you can use to create
a kubernetes cluster that's what i was looking for okay yep which underneath i mean it just
deploys the type of image that we use for our vm but it does a couple more things. It sets, for example, the plugin that manages the block storage.
It has other plugins or integrations with the Kubernetes components
that integrate, for example, with the node balances.
So we can define more through the Kubernetes API
and manage less via Terraform.
So in a way, this is a stepping stone to manage Kubernetes,
but it's a smaller step rather than the bigger one, which would have taken
more. Love it. Yeah, getting there sooner than later. I mean,
2016 infrastructure, now we're finally on Kubernetes. Great
things, I guess. The sky's the limit now. Yep, pretty much.
Maybe on the closing side of things, where are we lacking?
We talk about the future of where we're trying to go.
Obviously, we're not done.
Typical software, it's never really done, is it?
So we're always improving.
But where are we currently not as optimized?
Say, maybe an SSL or HTTPS, things like that.
Where are we lacking that we could be improving? So one thing which is always at my mind is... So one thing which I'm constantly thinking about
is what are the things that I have to spend time on that is not automated. And one of those things
is the stateful services which need updating. For example, PostgreSQL, to update it, it takes a lot of effort currently to do that.
And when I say a lot of effort, I mean a couple of hours. I don't mean days, but still,
it's effort that we should not spend. If you can define how the update should happen and what are the rules for the update. We have
a cluster so we can have multiple PostgreSQL instances and we can have automatic rolling
upgrades. That would be very nice to have. There's a PostgreSQL, there's Nginx, there's all these
components which are auxiliary to the app but are also part of the changelog stack. So that's one thing which
I would definitely like to improve because it's still a manual process. We build Docker images
and we would like to automate that aspect. The other one is HTTPS for sure and IPv6.
We are almost there with IPv6. We have it enabled on Linode. We have the DNS entries.
We also have it enabled on the CDN, so Fastly. We're already using their IPv6 feature.
There are still some links in the blogs, for example. We have some images that we
load them from S3 and we are not using the IPv6 URLs, which we should do, but that's a small thing. A slightly related thing is HTTPS.
I'm going to say it's slightly related because once you're IPv6, you want to use HTTP2,
and to have HTTP2, you need to have HTTPS. So everything needs to be encrypted. To do that, we currently have a
certificate that we manually have to renew, we manually have to install. When I say manually,
we just have to put it in LastPass, we install all the credentials. And then when you run
Terraform, it just gets set up, configured on the Node balance on the Linode side.
It would be great if we could use Let's Encrypt, which I'm a
big fan of, that gives you free SSL certificates. It's a great community effort, it's a great
industry effort, and it's something that as open source champions we should definitely be using.
It's one less thing to worry about. Does our certificate run out? It doesn't, by the way,
don't have to worry about that until 2020.
But it'd be nice to have this automated SSL certificates for us via Let's Encrypt.
You know, we could just throw the old SSL away and move away before 2020.
We don't have to wait till June.
Just saying.
I know, I know.
But we need to set up the integration with Let's Encrypt, right?
So how do we use the Let's Encrypt certificate?
And it would be nice if, for example,
Linode has this feature in their known balances.
So this is where, for example,
Linode can do the integration for us, right?
And for obviously all their customers. And then everyone gets to benefit.
So is that in place now or is that being worked on by linode well this was a feature request i just made it live in the show there you go
i'm sure marquez is listening marcus not marquez marcus yeah that's what i marcus johansson yeah
we've we've had good conversations while i came he's a great advocate inside of linode force as
well just to you know listen to the show, love what we do, and then obviously
wants to see us thrive. So that's awesome.
I have to say a lot of the questions and a lot of the things that we had to work out through
Linode API and by using the integration
which he helped build a lot of these components, we were in constant
touch with him and he was a great, great Linode representative
and Linode developer to help us with a lot of things
and obviously improve things for Linode as well
than all other users.
So SSL being improved, H2 being used,
we got some things we had to improve upon.
What about CDN?
I know we're kind of late in the conversation on that.
We had some weird slowdowns.
Is that worth at all touching on?
I mean, there's a lot of work went into that to make it faster.
And maybe it seemed like it should have been more straightforward, but it wasn't.
Are you referring to the 503s?
Yeah.
That was very low level in the networking stack.
It was like a layer two, layer three problem, which is very low level in the networking stack. There's like a layer two, layer three problem, which is
very low level in how routing works and how packets get lost and routes aren't updated correctly and
stuff like that. And that was in a period when we were transitioning between infrastructure. So this
was happening still in our old infrastructure. And I think since our new infrastructure, the
problems mostly went away. But now that you mentioned that, Adam, one thing which I would like us to do,
and this is somewhere where we somewhat disagree with Jared,
is to cache more by the CDN.
So that if the VM is down, for example,
or like if Linode was to have like an issue in the data center,
our static content that was cached would still be served.
So changelog would still be served.
So changelog would not go down.
As you said, too, most of the stuff we do is somewhat static, too.
I mean, once it's out there, it's sort of done unless there's an update, which is fairly infrequent.
I do not disagree that we should do that.
I disagree.
How do I say it i i realize the the costs of making that change versus the cost or the value
of doing other things which are higher priority i agree with you that should happen it's it requires
us to change the way that we're doing some of our programming in order to do that to go completely
behind the cdn and so that's why i've i've delayed on it i don't in
principle i'm with you i even on speed like i would love to have all of our content delivered
especially our content that is mostly static upon read i mean we write it and then it's published
and then we are caching things in the app but getting them into the cdn i agree with you
completely that being said the way that the app is built and the way that we like to keep things simple with the ability
to customize responses for signed in users, it just requires some heavy lifting to enable to
make that particular change. And so there's just lower hanging fruit is my, that's my contention.
So this is a great example, Jared, of understanding the landscape really well,
and knowing which steps are worth taking. So I think the CDN is very similar to my Kubernetes
in that I understand the value. I know that we should do it. But there are a couple of other
steps, smaller steps, easier step that we should take first. And I think we have been, I think we
have been with CDN, things have been improving, as Adam was mentioning.
We're not there yet, not for like not wanting it, but we understand the complexity that
goes with that change.
And this is the value of having a team and being a team and working together with people
who have different domain expertise or technology expertise is that we,
we have a dialogue and we have a push and a pull and we can disagree and we
can state our cases and we can move together for, you know,
we move forward together with our collective knowledge and experience versus
just making all, all the decisions yourself,
which is why I reached out to you in the first place,
Gerhard a couple couple years ago.
I was like, this journey would be a lot better
if I had somebody with me who had more expertise
in this scenario.
Yes, I could trudge through and figure it out.
It wouldn't be anywhere near as good.
And it would have taken way longer to get deployed.
But the long-term benefits is like,
we bring other things to the team, right?
And we build and grow together.
And so that's, as a person who works alone a lot,
it's just an enjoyable aspect of ChangeLog
that I think is an example here
with some of our minor disagreements
around what is a higher priority?
Adam, you have your opinions on what we should be doing next
and these are things that we discuss along the way.
I think what's interesting, too, is I've learned so much more about deployment
where it seems like it's fairly easy.
Oh, just put the code on the server, and there you go.
It runs, right?
It's so much more behind the scenes to having performant production code
that's monitored, that has failover,
has all these things that are concerns for
a modern app that not everybody really deeply understands or considers.
And I thought, you know, deploying is pretty easy, right?
But clearly it's not.
It's a lot.
It's very sophisticated.
I think that's the way it should be for most users of the system and for most developers.
But someone somewhere needs to
worry and needs to solve the hard problems. And the more experience you gain in a certain area,
the more you realize, well, actually, this is a lot more complicated than meets the eye.
And if you don't like that type of work, it's really hard to do good work and to solve tough
problems for users. And that's why some systems fail in weird and wonderful ways because they have a lot
of sharp edges.
People haven't thought long enough or hard enough, or they haven't done the way that
needs to be done for things to be easy.
So when we consume, for example, Linux services or Google services via an API or AWS, we say,
oh, that's easy. But there's an immense amount of work
that goes behind the scenes
that most users aren't aware of.
And that's the way it should be.
Yeah. How about this?
We'll turn it back on the listening audience too,
because I'm sure there's many, many more opinions
out there that we're not hearing.
So if you've heard this conversation
and you've read this blog post
and you've examined our source code and have
trudged through issues and different PRs and have, you know, just got some different insights that
you see that we can be taking different steps, help us plot out a roadmap, join the community,
change.com slash community, pound dev is where things are happening, pound SRE is there. I just
learned about that one today. It's pretty interesting. So if you want to chat with us or share feedback on new tooling, new services that we should put on a roadmap, please reach out.
As with all things, we fly by the mantra of slow and steady wins the race.
So we're in no rush to get there.
We launched our latest platform in 2016, so we've improved upon it every year since then, basically.
Thank you to Gerhard and Jared's hard work
for making that possible.
But we invite you, the community,
to share your thoughts as well.
So join us in Slack.
Join us in issues on GitHub
or even discussions on podcasts.
It's pretty easy.
I would really like that too, I have to say.
I would really enjoy to hear some feedback from the users
having put all this work out and having
made everything available.
I would really appreciate knowing what you think,
having
different viewpoints.
I always want to learn
and I'm sure
there are better ways or different ways
of doing things that I would like to know about.
Yeah, absolutely. And all are welcome. Please come. As Jared mentioned, the three frees,
go to change.com, sign up for free, get in slime for free and hang out and pound dev for free. It's
all free around here. Enjoy the party. We love it. Gerhard, Jared, thank you so much for all your
hard work on, on change.com. It's tremendous how far far we've come i can't even believe we were once on
tumblr wow man that's crazy and today such a different world and and uh just shows that uh
we're true to our motto slow and steady wins the race so keep pushing forward thanks fellas
great show thanks gerhard thank you both thank you bye alright thank you for tuning in
to this episode of the changelog hey guess what
we have discussions on every
single episode now so head to
changelog.com and discuss
this episode and if you want to help
us grow this show reach more listeners
and influence more developers
do us a favor and give us a
rating or review in iTunes or
Apple Podcasts. If you use Overcast, give us a star. If you tweet, tweet a link. If you make
lists of your favorite podcasts, include us in it. And of course, thank you to our sponsors,
Linode, GoCD, and Rollbar. Also, thanks to Fastly, our bandwidth partner, Rollbar,
our monitoring service, and Linode, our cloud server of choice.
This episode is hosted by myself, Adam Stachowiak, and Jared Santo.
And our music is done by Breakmaster Cylinder.
If you want to hear more episodes like this, subscribe to our master feed at changelog.com slash master.
Or go into your podcast app and search for ChangeLogMaster. You'll find it.
Thank you for tuning in this week. We'll see you again soon.
Because you've listened all the way to the end of the show,
got a little preview here for you of our upcoming podcast called Brain Science.
This podcast is for the curious and explores the inner workings of the human brain
to understand behavior change, how about formation, mental health, and the human condition.
This show is hosted by myself, Adam Stachowiak, and my good friend, Muriel Reese,
a doctor in clinical psychology.
It's Brain science applied.
Not just how does the brain work, but how do we apply what we know about the brain to better our lives?
Here we go.
That applied brain science really stood out to me because I want I don't want it to just be data.
I want you to go.
How can this fit?
What can I take away now?
How am I going to change?
And that that sort of is where you come in more. And even some of the questions like, so like, I want to ask you, what are some of the most
challenging things working in the tech world when it comes to relationships?
Probably the most important one is isolation.
More and more of the world and companies are being, for good reasons, they're being okay
with what they call distributed teams.
Yeah.
And that means that you and I, we work for the same company,
but you work from your home office.
I work from my home office.
I might go into the office a couple times a week if I live local.
But even if I live in San Francisco, I'm still probably a remote worker,
even though I can hop in an Uber or hop on the train or whatever
and go into the office and be there in a half hour.
But why waste the time?
You know, and this is where I would revisit what I want to talk about with resonance.
And that whenever we're learning, no matter what thing,
it's really helpful when we get feedback that's both immediate and specific.
And so when you're by yourself and you don't have any interaction with other people,
how can you get any feedback?
I mean, you're losing most of the nonverbal communication and you also don't have all of the voice
inflections or facial expression. Have you ever tried to be sad, feel sad and smile at the same
time? Try it. It's pretty hard.
Right, because facial expression is exactly what's involved
when it comes to empathy,
which is relationships.
I was reading a research article recently
and it talked about, you know,
how couples who are together
a really long time
end up sort of looking like each other.
I've heard that. Yeah. And so what they've looked at are together a really long time end up sort of looking like each other.
Yeah.
And so what they've looked at is when we actually empathize with other people, facial expression is really key within that.
And so when you empathize with the partner you're with over and over and over again,
your face begins to make the same creases and facial expression as it relates to where somebody
else is emotionally. Wow. Right? Say it again. So that's, that's creepy. Well, they've, again,
this is sort of the hotbed when it comes to neuroscience these days is mirror neurons.
And these mirror neurons are what are involved with empathy. And so mirroring, meaning I get another person's emotional world.
And so one of the research studies looked at Botox.
And what they found is that Botox, because it actually assists in paralyzing facial muscles,
then you can't contort your face
so you don't get wrinkles.
But actually levels of empathy go down.
Uh-uh.
Right.
Because your physical appearance
can't reflect your inner appearance.
Yeah, you got it.
And so when you're working in these remote locations,
it might facilitate better work or more focus.
And it allows people to be distributed and to
capitalize on the talents across the country right yeah wow so that's like a treasure trove
in my opinion talking about in a scientific way you know not just like hey this is my opinion
yeah about all the cons of that because i think what we can do is still have remote work,
but do it in more healthy ways.
Because I'm fully, I mean, I've been self-employed remote worker since 2006.
Now I'm a unique animal.
I know that.
My wife knows that.
Right.
I'm fine with it.
I'm a good human being, but I've got some flaws. And I'm willing to accept and share those to some degree. And I think the problem is we just lack maybe a more purposeful or intentional feedback loop.
Yeah.
Which I think is super important to being able to operate in this world in just good ways.
I don't know, healthy ways is probably the best way to use in this show context is healthy ways.
One of the things that's fundamental, I would say,
to being human is change, right? And so sometimes people come in and are really key in our life for
a period of time, and then things change. Either we grow or they grow or they change in a different
direction, and then the relationship changes or that feedback loop gets modified in some way. That isn't always
a bad thing. It's just going, my sense of choice actually is a critical component when it comes to
feeling good about my life. If I feel like everything is sort of outside of me and I don't
have any charge over it, like I didn't choose to work in a more remote location or I didn't choose to go to school or I didn't choose this person.
Then it feels far more oppressive as opposed to I actually participated in the outcome that I'm actually experiencing.
So I then also have more charge over whether or not I want to change it.
I think this feedback loop process that we're talking about here is super common to developers, they have this concept of agile, and basically
it means you produce something, you put it out there, and you expect the feedback loop
to happen in order to gain insights and course
correction to then release another version of it that continually and iteratively
becomes more and more improved. So this whole process
in day-to-day work in software is normal.
And I think it's interesting how it can apply to their lives and people's lives, you know,
to take the same importance of a feedback loop, for example, and apply it.
Right. Well, so this is very much how it goes in relationship, which is why there is an importance
when it comes to sort of things resonating. You ever walk into a room or an interaction with a couple other people
and like something just feels wonky or off?
You're like, I can't put my finger on it.
Definitely been there.
Right?
Well, and so to be able to identify that in relationships and even go,
wow, I need to, I'm experiencing this person in my world with the
limited interactions that I have with them. It hasn't really resonated with me. And so I don't
get good feedback. So now I'm going to be more defensive because I feel as though there's a
threat. It doesn't necessarily mean the person is threatening. However, my brain is going to tell me, hey, we need to be more protective.
We need to do some strategies so that you're not fully exposed.
One way I look at scenarios like this, I would say as of late, is because if you ever watched a TV show or a movie where the narration, the storytelling part of it, they expose a character in a certain light.
And you may dislike that.
They may be a villain or villainess, right?
Sure.
But the moment they turn the story to their backstory and why they are the way they are
or why they're acting the way they're acting.
Yes.
You then kind of fall in love with them.
You're almost rooting for them.
Right.
I feel like that's the same thing that happens day to day to our lives is that there are people who seem villainous or not for us, but we don't understand their backstory and why they are the way they are for us to have and employ that empathy that's required to have this dance, as you say, this iteration of relationship, you know, we just assume they are who they are and we project, you know, our worst fears onto them and they become true.
Yes, you got it.
This is why in the absence of, you know, a face, I don't really get to engage with people in the same sort of humanness that we are all in.
And so you're exactly right. I mean, over and over and over again, because you can identify and go, oh, that's why they're harsh. Or, you know, I recently had an
interaction I had shared with someone that I was a competitive gymnastics coach for a number of
years. And so somebody thought that my response to them when
they were really struggling was kind of harsh, but they remembered that I had told them I was
a coach for so long. And they're like, oh, this is just another side of her coming out.
Right.
And I'm not sure I prefer it, but I get it. And then it switched for their reaction
because then they're like, oh, wait, we're on the same team.
She's not trying to, like, oppress me or fight back against me.
She actually is helping me, trying to get me to where I want to go.
My wife and I, we've learned this concept of goodwill, right?
Yeah.
I can take your feedback or your criticisms in a different light if I know that you have goodwill for me,
meaning that you're not trying to harm me,
that you are for me,
not against me.
And sometimes change,
as we all know,
is painful and can be painful.
So sometimes the necessary feedback and or criticism that can influence that
change can also be painful,
but I can accept it differently.
If I know that she or they or
whomever is in the scenario with me has goodwill for me you know whereas if you know that they're
not for you then you obviously take it a whole different way and that's that's an okay thing
but we often are you know in relationship with people that are giving us crucial feedback and
we need to have that kind of that lens. Like it was significant in our marriage to understand, hey, I know there are times when
you give me feedback.
I am not happy about it, but I know you have goodwill for me.
So therefore, I calm down.
I listen.
I take that in and I process it, whatever.
But I take it in a different way because I know that she's for me
and not against me. Yep. One of the key things when it comes to change is a sense of openness
and even relationally, like of going, I need to be able to see how somebody else responds or how
they're feeling as based on their perspective of what they're going through
and not just my perspective of their perspective.
And so this goodwill is like, I believe that we're on the same side
and that you're not trying to make it harder for me.
But so I can understand if I were sitting where you were sitting,
had the background that you had, why you would have taken it in that way.
And then I can provide an opportunity to
clarify or create more connection, even when it doesn't feel good. And I honestly think this is
so much of what's missing in people's relationships. If I look at relational interactions
through the notion of conditioning, wherein I get a sort of hit of dopamine,
feel good feelings because I went to a person,
I had a conversation that didn't necessarily feel good,
but there was openness on both parties
to hear one another's perspective
that it actually then reinforces like,
oh, when I go and I have this exchange with people,
I feel better. So now I'm going I go and I have this exchange with people, I feel better.
So now I'm going to go and engage with other people and get the feedback,
even if I might not like the feedback, because now I'm buffered
and I'm not alone in this and somebody else sees my world.
That's a preview of Brain Science. If you love where we're going with this, send us an email to get on the list to be notified the very moment this show gets released.
Email us at editors at changelog.com.
In the subject line, put in all caps, BRAIN SCIENCE with a couple bangs if you're really excited.
You can also subscribe to our master feed to get all of our shows in one single feed. Head to changelog.com slash master or search in your podcast app for ChangeLog Master.
You'll find it.
Subscribe, get all of our shows and even those that only hit the master feed.
Again, changelog.com slash master. Thank you. you