The Changelog: Software Development, Open Source - Prometheus and service monitoring (Interview)
Episode Date: August 7, 2015Julius Volz from SoundCloud joined the show to talk about Prometheus, an open-source service monitoring system written in Go....
Transcript
Discussion (0)
Welcome back, everyone.
This is The Change Log, and I'm your host, Adams Dekowiak.
This is episode 168, and we're joined today by Julius Volz from SoundCloud to talk about Prometheus, an open-source service monitoring system written in Go.
Super awesome conversation today.
I talked about the data model, the query language, and all the in-betweens.
We have three awesome sponsors for the show, CodeShip, TopTile, and DigitalOcean.
Our first sponsor is CodeShip.
They're a hosted continuous delivery service focusing on speed, security, and customizability.
And they've launched a brand new feature called Organizations.
Now you can create teams, set permissions for specific team members,
and improve collaboration in your continuous delivery workflows.
Maintain centralized control over your organization's projects and teams
with CodeShip's new organization's plans.
You can save 20% off any premium plan you choose for three months by using this code, the ChangeLogPodcast. All right, everybody, we're back. We've got a great show lineup today what we actually been waiting
for for a bit it was recommended by peter bergon we just talked about him and go kid and go for
carnal that stuff but peter was recommending this jared our last uh guest was saying that this was
their you know prometheus was their tech to play with, so we had to get Julius Volz on the line here.
So, Julius, welcome to the show.
Hi. Pleasure to be here.
And also, we got Jared hanging out in the wings there. Say what's up, Jared.
What's up, Jared?
So, Jared, we were at Go4Call not long ago, so we met Julius and also Bjorn, who couldn't make this call, but we were excited to finally get a chance to get Prometheus and this conversation talking about metrics tracking and stuff like that on this show.
So what's the best way to open this one up?
You want to talk about Julius a bit or you want to go right into the tech?
Well, first let me say that we kind of did the hallway track at GopherCon, and we were out interviewing people and talking with everybody.
And there was two things people were excited about.
One was Ben Johnson, who we lined up to come out here pretty soon,
and the stuff that he's been up to.
And the other one that everybody was excited about was Prometheus.
Yes, true.
In fact, I think, Julius, you guys even got a shout-out during one of the keynotes.
Is that correct?
Yeah, we got a bunch of shout-outs.
I think from Peter's talk, from Tomasz's talk,
the keynote.
So yeah, really, really exciting.
Very cool.
So we're excited to hear about it.
We want to know all the details,
but I think, Adam, maybe if we start with the history,
we can kind of see why Prometheus even exists.
Do you want to start there?
Let's do that.
So Julius, you've been with SoundCloud for a bit.
Before that, you were with Google.
What was going on to make Prometheus a thing for you?
Yeah, so when I was at Google, I was actually doing something completely different.
I was in Google's production offline storage system.
So basically, we had many tens of data centers with huge tape libraries backing up
all production data that Google had. So basically an exabyte scale backup system globally.
So monitoring wasn't really my specialty there, but I definitely came in contact with it as a
site reliability engineer on that service. And when I left Google and joined SoundCloud back in 2012,
it went as it often goes.
When Googlers left Google at around that time especially,
they felt a bit naked in terms of what the open source world
provided them in terms of infrastructure.
Because at Google you have an awesome cluster scheduler,
you've got awesome monitoring systems,
awesome storage systems, and so on.
Suddenly, you get thrown out into the wild
and you miss all of that stuff,
and you feel just this urgent need
to be building a lot of that yourself again.
But when I joined SoundCloud,
a month prior to that, another ex-Googler was also joining SoundCloud a month prior to that
another ex-Googler was also joining SoundCloud
Matt Proud
and he felt even more strongly about this
and he was particularly unhappy
with the state of open source
monitoring systems
so he had actually already in his free time
started building
client libraries
for instrumenting services with metrics.
And his grand vision was to build a whole monitoring system.
So when I joined a month later, he kind of pulled me on board
and we started building something in our free time
that eventually became Prometheus.
So just in the first months, end of 2012,
that was really just our free time. Finally,
we got enough of it working in such a way that we could expose data from services, collect it,
query it, and maybe even show it in a graph. And that was the point when we decided, okay,
this is actually going somewhere. Let's give this a name. Let's call it Prometheus. And briefly afterwards, we started formally introducing that at SoundCloud.
And yeah, nowadays it has become SoundCloud's standard monitoring system and time series
database. Now, deep topic aside, I got to ask the question, which is one of my favorite movies out there by ridley scott is a movie called
prometheus is there any correlation i have never watched that movie actually well we see aliens
come out of the code at some point right so that that was actually funny i think it actually came
out around the same time okay but it wasn't really on my radar back then um i think i just
briefly had heard about it but it wasn't really any it wasn't really
connected to this okay yeah all right yeah prometheus uh the movie came out in 2012 and
i remember loving the name and not loving the movie so much adam so maybe that's a separate
show but we could yeah we could go i heard i heard a lot of bad things about that we could
pause this for a minute and let me rant.
We could just go start another show.
I'm just kidding.
Maybe I should go a bit more into what we had at SoundCloud back then,
because that was kind of the big motivation to build Prometheus.
Well, you said that you felt naked.
As a Googler, you felt naked coming out of Google
and some of the things missing.
So this was obviously one of those things missing.
Right, yeah.
But you might ask, there were many open source monitoring systems, right?
Why were we not happy with those?
We're asking that question.
We like that question.
I actually had that question queued up for you.
Yes, that's the next question.
Cool.
So, I mean, back then, SoundCloud was doing this migration that a lot of companies do,
migrating from one monolithic web application to a set of microservices just because the initial monolithic application has grown too big, too complex.
People don't want to maintain it anymore.
You can't have independent groups deploying independent things. So SoundCloud pretty early on actually started adopting Go and built their own kind of Heroku-style in-house cluster scheduler called Bazooka.
And that was already a container scheduling system, a very early form.
We're still using that actually before Docker came out, beforeubernetes and so on came out and the challenge was now that we had these
hundreds of microservices running on these bazooka clusters with thousands of instances and
developers whenever they built a new revision maybe every day even scaled down the old revision
and scaled up the new revisions and all these instances would land on random hosts and on random ports.
And somehow we needed to monitor them.
So what we did back then was,
what SoundCloud did back then was use StatsD and Graphite
as the main time series based monitoring system.
So StatsD and Graphite had several problems.
So when I joined, I remember the StatsD
server almost falling over
because it was a single threaded
node application running on a
huge beefy machine, but it could only use
one core. So it was actually throwing
away UDP packets
left and right. I don't know if you know how StatsD
works. The general
working model is
that let's say you have a set of web servers, let's say an API server, and you have 100 instances of
that. Then if you want to count the number of HTTP requests that happen in that entire service,
every one of these instances for every request that they handle, send a UDP packet
to StatsD and StatsD will count from all these hundred instances, will count up all these
counter packets from these different instances over usually a 10 seconds interval and then
finally sum them all up and write a single data point out to Graphite at the time.
So Graphite is a time series storage system.
And StatsD is kind of in front of it to aggregate counter data into a final count per 10 seconds.
And you can do some stuff there.
Like you can say on the service side, please only send every 10th UDP packet or something.
So you alleviate the load somewhat.
But the main pattern is here that you're doing the counting in the StatsD site.
And yeah, that StatsD wasn't really scalable.
It was throwing away UDP packets, wasn't really working that well anymore.
And the other problem was Graphite's inherent data model so in graphite if you store a metric a time series it's only a single metric name with no dimensions so it has some dots in the middle
that allow you to separate components of a metric name. And people use that to encode implicit dimensions.
So for example, you might have a metric named API.http.get.200
to count the successfully handled get requests of an API server.
And that works, kind of. It doesn't scale
too well. Graphite doesn't deal very well
with you going wild
with these dimensions.
It doesn't allow you in the
query language to be particularly
flexible about how you query
for these dimensions, and they're also implicit.
So you look at one of these dot
separated components, and you can kind
of guess what what
it would mean but you only see the value you don't see the key usually another problem there was that
due to this limited dimensionality it was really hard to figure out which particular host or which
particular service instance a metric was coming from.
So let's say you have a global latency spike.
So if you have these counters over 100 instances,
they all get counted into one metric in the end,
and you don't really see if there's a spike.
Was it only in one instance?
Was it in all instances?
You can't really drill down there
anymore some teams have actually then encoded the instance and the port like the host and the port
of an instance into the metric name into one of these dot separated components but graphite is
not really meant for that and and it blew up pretty quickly so they had to run their own graphite
server but that is not particularly fun because graphite is not so fun to run either.
So yeah these were kind of the problems we encountered with the StatsD and graphite
combination that was for service monitoring. So when I say monitoring actually I kind of mean
I mean different people mean different things with that.
I mean, both time series collection and trending and alerting.
Some people, when they say monitoring, they think of only something like Nagios, only something that alerts people.
See, Jared?
What?
Did you hear that? Nagios.
Oh, how do you pronounce it?
That's the European take on it. I took a break there to butt in, but pre-call, you can mention it, Jared, but he set up some Nagios servers.
So anyways, you said Nagios.
So Nagios is the way you pronounce it.
Well, I don't know how to pronounce it.
That's just, you know, I used to be a network administrator back in the day, and I was the only one doing it, so you never say it out loud.
But I just thought it was Nagios, because it nags you all the time.
I thought they had a play on words.
That makes so much more sense.
But Nagios could be right. I don't know.
See?
That makes sense.
But yeah, when I think of alerting, I think of something more like that.
But you actually say service monitoring.
You include alerting in your definition.
Is that what you're saying? include i include time series collection i
include uh the graphing i include the alerting so the whole complex of getting metrics from your
systems and and acting on it and notifying someone okay um it's it's kind of just a question of
definition i guess sure um yeah so so we used Nagios, Nagios back then. Partially just running completely
these stateless checks that you run on a host to see if things are good right now. And partially
based on graphite based time series. And yeah, that was fine. But Nagios is kind of also from
the nineties. It's,. Its data model is very limited.
I mean, it knows about hosts and services on those hosts.
And if you have something like a cluster-wide check
or things that just don't fit into that pattern,
you kind of have to squeeze them into that pattern.
And that sometimes works, sometimes not that great.
It's really hard to silence by any arbitrary dimensions in Nagios.
So yeah, the data model there is also a bit painful.
The UI, I think we don't even need to talk about.
Nowadays, we're actually using Icinga, which has a bit better UI.
What's it called?
Icinga is basically a drop-in replacement for Nagios.
So it uses the same database.
I don't think you have to change much.
It's just kind of a new UI.
And I think it has a bit of a different,
more scalable mechanism for executing checks.
But I'm not really an expert in that area.
Yeah, so that was for service monitoring.
And for host monitoring, we had Ganglia.
And Ganglia is pretty much completely
you know you have the host as a dimensional key there but not much else of course also the metric
name but there's no query language there's no nice graphing interface and so on you get these
pretty static dashboards with host metrics and um yeah so we used also Nagios, of course, for the host alerting then.
This might be a little bit premature,
but I just went to the Nagios,
and we're all going to say it different ways, by the way.
Nagios, Nagios, Nagios.
They say they're the industry standard
for IT infrastructure monitoring.
What is the goal or what was the goal with Prometheus?
Was it to redo what everyone had been doing prometheus was it to you know redo what
everyone had been doing not quite so well because you have opinions and you know obviously some some
skills to do it but was it is it the goal to sort of unseat some of these existing players or is it
to just sort of like rebuild something new that that made sense for soundcloud uh yeah definitely
so uh for us it
was the goal to replace statsd to replace graphite to replace nagios in the end with a new kind of
ecosystem that is uh more powerful and more integrated and allows you to do more stuff in a
more modern way so yeah definitely we we hope to make people depend less on those old tools, I would say.
So we kind of sometimes jokingly call it a next generation monitoring system.
And it does try to cover all the aspects from instrumenting your services, collecting the
data, showing the data in the dashboard, alerting on the data if something is wrong,
and then sending those notifications to you.
So yeah, it tries to cover basically the whole field.
What it does not do is event-based monitoring.
So if you want to do per-request accounting,
let's say you want to really collect every individual event,
a use case like logging or a use case like
elastic search where you can really put every individual record of what happened in there
that's not really what we're trying to do prometheus is really in the business of collecting
purely numeric time series that have a metric name and a set of key value dimensions and those the metric name and the key value
dimensions uh uniquely identify every series and that you can then actually use together with a
query language to do really powerful queries to aggregate and slice and dice based on whatever
dimension you're currently interested in during the query actually and yeah so you started building this in your free
time or your you and your buddy started building it i'm curious just kind of the the inner workings
of soundcloud where they're at with open source and how much freedom they give you as an engineer
was this something that you had to sell to your boss or to the company or was it just like well
we're doing this now and whatever you guys think is the best solution
must be right?
Yeah, so this was definitely an interesting history.
I think at the beginning,
we just took the liberty ourselves
to do that in our free time.
There was a lot of resistance at the beginning
to introduce that at SoundCloud,
which totally makes sense to me,
especially in retrospect,
because to be honest at the beginning
nothing was really working it was i mean late 2012 early 2013 um the the main server was pretty
immature it wasn't really performing well there a lot of ecosystem components were missing um
and there was no real dashboarding solution yet and so on but like as time went on
we just kind of you know we i think we we took we took quite some liberties there in just pushing
this project on and it became better and better and i would say like probably one and a half years
in we had the main server that collects the time series and makes them curable.
We had that pretty mature and stable.
We had Promdash, which is the Prometheus dashboard builder.
So finally, people were actually able to build dashboards on top of the data that they collected.
And we also had one of our really first killer use cases where we got instrumentation about all the containers that were running on Bazooka or in-house Heroku system.
So you could get for every application revision and proc type keyed by those dimensions and more actually the current CPU usage, the memory usage, the memory limit, and so on and so on.
And that really started convincing people that this was really worth it.
And then I think that was kind of the tipping point where shortly after the strategic bet was made in SoundCloud to really switch to that.
And in terms of open sourcing, that was interesting because when we started this initially,
we just put it up on github without
asking anyone on its own organization and um so it's kind of a weird status i guess um
it was a private project it's still arguably i mean it was definitely started in in the free
time matt even started before he joined soundCloud. And we've been trying since then
to keep it as independent as possible
from any single company.
So we really want this to be an open community project
without one company controlling
too much of the direction and so on.
And before, so we put it on GitHub back then,
but we really didn't make any noise about it.
So we only told a couple of friends, especially also other ex-Googlers.
So I guess I have to say Prometheus is kind of inspired by a lot of what we learned about monitoring at Google.
And a lot of people who quit Google then either asked us, hey, do you know anything similar?
Or they just discovered prometheus
and kind of noticed that it was very similar to what they've been used to so before we even you
know went more public about prometheus we had a kind of an insider circle of people using it
testing it already at one of our ex-colleagues from soundcloud who then went to docker he started
using it at Docker.
And another colleague used it at BoxEver,
which is a Dublin-based company.
And so he's in Dublin.
And in terms of open sourcing, so it was open source, but only in the beginning of this year, for the record,
since it's a podcast, this year is 2015.
In January, we decided, okay, it's finally, it's really ready this year is 2015 uh in january we decided okay it's finally
it's really ready enough to share with a broader audience so um just leading up to that we had a
lot of discussions with you know internal departments about how we should communicate
this and what's the legal status around that in the end everything was pretty relaxed and we had uh you know blog posts on the soundcloud
backstage blog and on box everest blog and i think on my docker colleagues uh private blog back then
and yeah and and then it really took off and um that was so it took some work then though it took
some commitment from you and matt and others that were sort of seeing the light of where this can go.
I was going to say, did you just run it concurrently alongside your StatsD stuff until it showed its value?
And then you were able to eventually cut over?
Or are you still running your StatsD stuff as well?
So, yeah, that's what we did.
And StatsD is still running because, you know, you never turn off old systems in practice.
But practically nobody
is using that anymore very few people are using that so if you're building a new service at
soundcloud it's it's going to use prometheus um there's some uh legacy stuff on statsd and
graphite still and there's some stuff that was hard to convert but uh yeah for the most part, it's all Prometheus now. And yeah, it's been really a ride,
especially since being more vocal about it
beginning of the year.
We've really, I mean, the community has grown crazily.
We have contributors from all kinds of companies.
We get a lot of contributions.
Basically, we get contributions almost every day if not multiple
i think google is now google's kubernetes is now natively instrumented with prometheus metrics so
if you want to monitor kubernetes you don't even need to have kind of any kind of adapter to get
prometheus metrics out of there you have coreOS adopting it quite a lot for their components.
So etcd is one notable mention there
that is already sprinkled with Prometheus metrics.
Then you have DigitalOcean completely adopting it
for their internal monitoring right now.
I don't know how much I can say about that,
but I think these are the three companies
where they're like reasonably public about what they're doing with Prometheus.
I know of a bunch more, but I'm not sure how much I can say about those.
Sure. Well, there's definitely tons of details.
Any system that looks to replace a handful of legacy systems will have many moving parts.
And you have an architecture, you have a data model, there's a query language query language there's lots of details we want to ask you about all of them first we're going to
take a quick sponsor break uh hear a word from our awesome sponsor and then we will be back with
all the nitty-gritty details of prometheus we are back top tile is by far the best place to work
as a freelance software developer i had a chance to sit down and talk with Brendan Beneshot, the co-founder and COO of TopTile.
And I asked Brendan to share some details about the foundation of TopTile, what makes TopTile different, and what makes their network of elite engineers so strong.
Take a listen.
I mean, I'm one of the co-founders, and I'm an engineer.
I studied chemical engineering, and to pay for this super expensive degree, I was freelancing
as a software developer.
And by the time I finished, realized that being a software developer was pretty awesome.
And so I kept doing that.
And my co-founder is in a similar situation as well.
And so we wanted to solve a problem as engineers and do it from as a network of engineers,
kind of for engineers by engineers. And having that perspective and consistently bringing on
new team members who also share this really makes TopTel different in that it's a network of
engineers, not kind of like you have TopTel and then the developers. It's never about us and them.
It's always us.
Like everybody at TopTel, for the most part,
refers to TopTel as their company,
and they feel like it's their company,
and everybody acts like a core team member,
even though they're freelancers within the TopTel network.
And all of these things are extremely important to us.
All right, if you're interested in learning more
about what TopTel is all about,
head to toptel.com slash developers.
That's T-O-P-T-A-L dot com slash developers to learn more.
And make sure you tell them the change I'll send you.
All right, we are back talking to Julius Bowles about Prometheus, the data monitoring system out of, well, kind of out of SoundCloud.
Maintained by some SoundCloud people, used by SoundCloud and others,
and really making a name for itself in the industry.
Julius, we want to talk to you about the details of Prometheus.
You talked about some of the problems that you guys have run up against
in different systems, and you obviously look to solve those problems with Prometheus.
So maybe take us through the high level points and we'll dig down as we find them interesting,
starting with the architecture. I know it's kind of hard without visualizations,
but if you could lay it out in words, what are all the moving parts and how do they fit together?
Sure. I actually have the advantage that I have the architecture diagram in front of me.
But if you as a podcast listener also want to view it, head over to prometheus.io and scroll down in the overview section.
So I think the heart of Prometheus is the Prometheus server, which is really you run one or multiple of those in your company and you configure it to scrape targets.
So basically services that you're interested in.
Prometheus is kind of believes in the church of pull.
That means it pulls data rather than having data sent to it.
And actually we should really go into why we decided to do that
because that's an
interesting religious kind of point um but let's do that later maybe um so you configure that server
to scrape your services and these services are can can be one of three different things so it
could either be your own service that you can instrument with one of our client libraries and the client libraries
allow you to expose things such as countermetrics gauges histograms and summaries the latter two
are kind of hybrid metric types that give you either you know like bucketed histograms or
quantiles and so so the client libraries give you
programming language objects
that allow you to track counter state and so on
and then also expose it over HTTP
and Prometheus server,
the Prometheus server then comes by regularly,
usually every 15, 30 or one minute
or whatever you configure
and scrapes that endpoint,
gets only the current state of the metric.
So there's no history in the client.
It only gets the current state.
So let's say for a counter,
it would just get how many requests have happened
since this service instance started.
And the counter's never reset.
So you could have two totally independent Prometheus servers
scraping the
same target and getting the identical data and so prometheus does that stores these metrics
locally in a local storage i should say that currently we only really for for the querying
we only really have a local on-disk storage. Our goal was to have single server nodes which are completely independent of any other thing on the network.
When things really go awry and you need to figure out what's going on during an outage,
you really can go to that one server and look at your metrics without having to
depend on complex distributed backend storage and so on.
We do have support for writing to, experimental support for writing to OpenTSDB and InfluxDB
at the moment, but it's not possible yet to read back from those through Prometheus via
Prometheus' query language.
So if you want to get data out of those again, currently you would still have to then head
to those other systems.
But that's on the long-term roadmap.
We definitely want to have a long-term storage that we can read back from.
The local storage is good for a couple of weeks or maybe even months, maybe longer,
depending on how much data you have.
But it's not really meant as a forever storage.
That's just a simplicity decision just because you guys want it to be simple.
Yeah, on one hand, it's much simpler to implement, of course, than a distributed system.
And we also believe that through the simplicity, hopefully, you'll get more reliability
out of this in the end. So if, let's say, you wanted to have HA, high availability,
you would simply run two identically configured Prometheus servers scraping exactly the same data.
And if one goes down, you still have the other one to go to. But they're not clustered.
So they're completely independent of each other.
And if you want to investigate state during an outage, you just need one of them up.
And you can go to either one and see what's actually happening.
Okay, so normally instrumented jobs are one of the three types of things that Prometheus can collect data from.
But you might also have something like a Linux host machine or HAProxy or Nginx,
things that you cannot easily at least instrument directly.
You probably wouldn't want to go into the Linux kernel
and build a module that exports Prometheus metrics over HTTP, right?
So for that, we have a set of export servers,
we call them exporters,
which are just basically little jobs,
little binaries that you run close
to whatever you're interested in monitoring.
And they know how to extract the native metrics
from that system.
So for example, in the case of the host exporter,
it would go to the proc file system
and give you a lot of information about the networking
and the disks and so on and so on.
And these little exporters then transform
what they collect locally into a set of Prometheus metrics,
which they again expose on an HTTP endpoint
for Prometheus to scrape. And that's how Prometheus metrics, which they again expose on an HTTP endpoint for Prometheus to scrape.
And that's how Prometheus can get information
from these kinds of systems.
And we have a lot of exporters
for all kinds of systems there already.
Finally, the third kind of thing you might want to monitor
and which can be a challenge is things like batch jobs
or things that are just too short-lived to be exposing metrics
and to be scraped reliably by Prometheus.
So in that case, let's say you have a daily batch job
which deletes some users or so on,
and you want to track the last time it ran successfully
and how many users it deleted.
For that, we have something called the push gateway,
which is kind of the glue between the push and the pull world,
which you're only really supposed to be using
when you really have to.
And the batch job could then push at the end of its run,
usually these metrics, the last run and the deleted users,
to that push gateway.
And the push gateway would simply hold on
to those metrics forever. And the push gateway would simply hold on to those metrics forever.
And the Prometheus server can then come by
and scrape it from the push gateway.
And yeah, so that's kind of the data ingestion side of things.
In the architecture further there,
so after the data is collected and stored,
we can do two interesting things with the data.
We can look at it as a human on the dashboard
or directly on the Prometheus server.
So for dashboarding, we have a couple of solutions.
We have Promdash, the Prometheus dashboard builder.
It's really kind of a UI-based,
click-based dashboard builder, similar to Grafana.
When I started building Promdash, Grafana, to my knowledge, didn't really exist yet or not at all.
But it's roughly comparable to that.
But since then, Grafana now directly from the Prometheus server.
That's kind of a power user use case where you can build any kinds of HTML based dashboards.
And these templates then have access to the query language of Prometheus. So they allow you to build even dynamic layouts
depending on the data that you have
in your Prometheus instance.
So that's visualization.
And then the last part that we do in Prometheus
is alerting.
So you have collected a lot of data now
about all your systems, your hosts, and your services. And now you can actually a lot of data now about all your systems your hosts and your services
and now you can actually make use of that data to see if something is wrong somewhere to see if a
batch job hasn't run for a while to see if the request rate of some services are too low or
errors are spiking up and you can actually use the same powerful query language that you can use to display stuff.
You can use the same language to formulate alert conditions under which people should get notified.
And since you might have multiple of these Prometheus servers that each compute these alert conditions in the company,
you might want to do some correlation between them
and alert routing and so on.
And that's better done in a central place.
So you'll usually have one or a few alert managers
in your company.
That's a separate binary again that you run usually once.
That all the Prometheus's in your organization
send currently firing alerts to.
And the alert manager then can do things like
inhibit one alert if another one is firing.
It knows how to route alerts
based on the key value dimensions on the alerts
to specific notification configurations,
to specific teams and so on.
And it supports a range of notification mechanisms like pager duty, email, Slack, and so on.
So that's kind of the overall overview over Prometheus.
Just one question on the visualization side.
What's the purpose of having a separate, like the prom dash aspect and then also built-in
graphing and querying.
Is one for a certain use case
and one for a different use case?
Yeah, definitely.
So the built-in graphing is really more useful
for ad hoc exploration,
really off data that is in one Prometheus server.
And that's good, you know,
even if your prom dash is down
and you really just want to see what's happening
in one Prometheus server, you can go there.
You can do very rudimentary graphing
so it doesn't have all the bells and whistles
that PromDash has, you know, like stacked.
It does have stacked graphs,
but it doesn't have like multiple axes,
multiple expressions in one graph,
different color schemes and things like that.
So it's quite simple,
but it allows you in the worst case
to still explore the data in that Prometheus server.
And Promdash is really a dashboard builder.
So that's for when you really want to persist a dashboard forever
and for other people to see and to share.
And especially it's very useful, let's say,
I think in SoundCloud we have maybe roughly 50 Prometheus servers.
And we have one central promdash installation, which just knows about all these Prometheus servers.
And in there you can then have dashboards or even single graphs where you show time series or query expressions from multiple different servers in one graph.
So yeah, it's more of this nice wall dashboard use case.
Yeah, so the alert management would be part of the built-in UI.
The configuration of your alerts and stuff would be
what you'd use the built-in UI for?
Yeah.
So for alerting, that's actually part of that that's partially in the prometheus server
and partially in the alert manager okay um so in the prometheus server you can define rules um
basically rules that alerting rules that get executed let's say every 30 seconds or one
minute commonly uh depending on what you configure and what happens there is that it really just executes a query expression
and sees if there are any results from that expression.
We maybe should go a bit into how the query language works.
And if there are any results from that expression,
they get transformed into labeled alerts
and get transferred to the alert manager
where they can then be duped, silenced, rooted, and so on.
And this is kind of interesting
because this whole labeled key value data model
goes all the way from the instrumented services
to the storage, to the querying,
and all the way to the alert manager.
So you really have that chain of dimensional information
to work with at every point in the chain.
Yeah, it sounds like everything builds off the query language
and the query language builds off of the data model.
Exactly.
So maybe the data model is probably the next place to dig in
and tell us what it is, how it all works,
and maybe if that's unique to Prometheus
or something you took from somewhere else.
Just go into the details on how the actual data is modeled.
Sure.
So Prometheus stores time series.
And time series have a metric name,
and they have a set of key value dimensions,
which we just call labels.
So you might have something like a metric name, HTTP requests total, which tracks the total number of HTTP requests that have been handled by a certain service instance since it started.
But then you might be interested in drilling down right you would want to know which of these are get requests which which path handlers have been hit and so on and for that you can use the label
dimensions so for example you might have method equals get on there and you might have status status equals 200 for the outcome and so on and these dimensions then get stored and they allow
you to query time series by these dimensions so you could say you know some over all the dimensions
except the status code dimension then you would get the total number of requests over all your service instances
but keyed by by the status code so that dimension would be preserved or you could just select a
specific dimension or you can even do so let's say you have one metric and you have all these
kind of sub-dimensional instantiations of that metric.
You know, one for method equals get, one for method equals put,
and then under these you have, you know, the other labeled dimensions.
So for one metric name, you actually get a lot of time series with all these different label sets.
And now if I just query for just the metric name,
I get all these time series back.
If I don't filter, if I don't aggregate and so on.
And that can be very useful.
So let's say on Bazooka, we have a use case where we have one set of these time series just describing for every instance running on Bazooka.
What is the memory limit?
How much memory can it use
before the cluster manager kills it, right?
And we have another metric called
basically the current memory usage.
And if we just have these two metric names,
we can actually, in the query language,
just put a minus in between them
to subtract the current
usage from the limit to get kind of the headroom you know the the memory that they can use still
use before they would get killed if we wanted to know like how well do instances utilize their
memory and what would actually happen if we just put a minus between these two metric names
is that not only a single number there's not only a single number on the left
or a single number on the right,
but you have these whole, let's say, vectors of time series
on each side of this binary obturation,
and they get matched on identical label sets.
So the usage of one instance is matched
with the limit of another instance and so on and so on
and in the end as the output of the expression you get again the current headroom per instance
with all the dimensional labels still preserved and you know you can do go more fancy than that
you don't need to have an exact match there there's like several language constructs that allow you to do one to n or n to two one matches and so on and and specify how
exactly to match things but this kind of vector-based matching algebra i think is
quite unique to prometheus at least in the open source world yeah so the you give it a name and
then a series of labels.
And it sounds like the labels,
that's what you refer to as the multidimensional aspect
because each label you add adds a dimension
to that particular time series.
And then your guys' built-in querying for that construct
is really where it sounds like the flexibility is coming from.
Am I following you?
Yep, that's totally correct. And maybe one word of warning for the labels they're really meant to be
kind of dimensions but they're not meant to be of arbitrary cardinality so let's say
if you wanted to store a user id of a service with millions of users, you probably would not want to use a label value for that
because you would suddenly get millions of time series
for this one metric.
So you really have to be aware of that.
Every combination of labels on a metric
creates one new time series automatically.
And these time series are indexed and so on
and they need
to be managed um so if you really want to have that kind of uh highly arbitrarily uh high cardinality
dimensional insight like storing email addresses or storing user ids and so on or the content of
my sql queries, the actual query string,
then you're probably better served with something like a log-based system,
InfluxDB or Elasticsearch and so on,
that really can store individual events, individual things with arbitrary metadata.
So I can see where the labels might get a little bit where there's better and worst
practices with them whereas you know with a more just a key value namespacing thing it's it's pretty
easy to just come up with the next name you drill down one dimension but as you add dimensions
i can see where it get difficult and you're in fact warning against things not to do is there
a place to go where it's like hey what how how would i do this in a typical situation because i think across many
organizations the type of metrics are similar do you guys have best practices or things you've
learned at soundcloud best ways to use prometheus labels oh yeah definitely um so we actually have
a whole section on best practices uh at the very bottom of our website about metric and label naming
and how to build good consoles, dashboards,
and alerting and so on.
I think, yeah, one thing that really just happened
sometimes at SoundCloud is that people mistakenly,
either by not yet knowing the Prometheus data model
well enough or just by making a simple mistake in the code,
have set some of these label dimensions,
let's say, to a track ID or a user ID.
And that then creates millions and millions of time series.
I mean, a single Prometheus server
can handle millions of time series.
But if you just overdo it a bit
and you're not careful about what you stick into label values,
then you can really easily blow up a Prometheus server.
So keep those label dimensions to sane, bounded things.
So you always have Prometheus automatically attaches some of them anyways.
So you get the name of the job, which is kind of the name of the service.
It's just terminology, I guess.
The name of the service, which we call job,
the host and port of the instance by default.
And that already gives you some dimensionality,
even if you don't have any labels
on the side of your service, right?
So you at least get, if you have 100 instances,
you get 100 times series for this one metric, which could be the
number of HTTP requests. And then you have to multiply that by all the other dimensions that
you add. And that can easily end up for a single metric, you can easily get, you know, thousands
or even 10,000 of time series. Well, certainly lots of moving parts when we talk about Prometheus. So I'm going to assume that based on this conversation, so many people are like, I want to try it out.
I want to get started.
So we're going to take a quick break.
And when we come back, we're going to talk about just that.
We'll be right back.
I have yet to meet a single person who doesn't love DigitalOcean.
If you've tried DigitalOcean, you know how awesome it is.
And here at the Changelog, everything we have runs on blazing fast SSD cloud servers from DigitalOcean.
And I want you to use the code CHANGELOG when you sign up today
to get a free month.
Run a server with 1GB of RAM and 30GB of SSD drive space totally for free on DigitalOcean.
Use the code CHANGELOG. Again again that code is changelog use that when you sign up for a new account head to digitalocean.com
to sign up and tell them the changelog sent you all right we're back with uh julius voles
talking about prometheus uh and while we were on that break, we realized that getting started is a good step to go towards next.
But we forgot.
We want to kind of go back a little bit on this religious piece of push versus pull when it comes to Prometheus.
So Julius, why don't you lead us through that piece there?
Sure.
So this is funny because it's a bit of a religious thing.
And push can be, you know, pull can be sometimes better sometimes push is better
depending on the type of environment you're using the prometheus in but one of our team members even
wrote a blog post about push versus pull for monitoring he's he's brian from from dublin
and you can find that in our faq actually but I think some points are interesting.
So if you do,
so I think first let's start with one advantage of push.
Push is really easy to get through firewalls
if your monitoring system
is easily reachable from everywhere.
You only need to make one point,
one network point available on the internet
or in your local,
in your company's network or whatever.
And then everyone just needs to be able
to push somehow to that.
With pull, sometimes people run into the problem
that let's say, you know, if they have setups
where they need to pull from various endpoints
on the internet and they should be secured and so on,
you know, they have to have a bit more,
they need to now secure and make available
N endpoints instead of one.
So that's often what pains people
when they can't use pull.
But for us, especially in these kind of
modern web company environments
where you have your own data centers
or your own virtual private
clouds and you have internally trusted environments where you can just pull from every target
pull really has a number of advantages so one thing that's really really nice is that you can
just manually go per http to a target and get the current state of the target.
So by default, if you go to a Prometheus endpoint on a service,
you will get a text-based format that will tell you the current state of all the metrics.
And you don't even need a server for that.
So that's one nice thing.
You can run a complete copy of production monitoring on your laptop or anywhere.
You can just bring up a second copy of all of it
to do experiments, to try out new alerting rules and so on.
And that copy will get the exact same data
as your production version of monitoring
without you having to configure the actual services
to send data somewhere else.
And we kind of argued that if you're doing service monitoring and alerting, you kind
of need to know, your monitoring system kind of needs to know anyways where your services
live and which services should currently be there.
Because otherwise it can't really alert you about a target being down or so on because it
doesn't know if it should be gone if it was deprovisioned or you know if it is just crash
looping for example so with that kind of argument the monitoring system should be knowing what your
targets are anyways so the knowledge is already there so that also makes it easier to pull the data
and makes it easier to tell in monitoring and alerting
whether a target is currently down.
And yeah, so we don't think otherwise
that it's like a huge issue,
whether you do push or pull,
especially in terms of scalability,
it doesn't really matter that much.
But yeah, it kind of depends on your environment.
I think there would be some scalability aspects of pulling as you had more
services, more hosts. I guess
you had your StatsD servers drop in UDP packets. It seems like
catching UDP packets is a lot easier than going out and requesting data.
Have you found in practice that that's just not a big issue?
Yeah, so that's actually an interesting point.
So that's really not an issue at all.
So the actual pulling side of things has never been a bottleneck for us.
But it's also very important to point out here that the whole fundamental way of how data is transferred is quite different in the StatsD model
to the Prometheus model.
As I said earlier, in the StatsD model,
you send UDP packets basically proportionally
to the amount of user traffic you get, right?
Like for every HTTP request or every 10th or so on,
you send a UDP packet,
please count this, please count this, please count this.
Why don't you just increment a number in memory
on your web server?
And then every 15 seconds or so,
transfer the current counter state.
So that's Prometheus' philosophy.
The nice thing is there, it uses way less traffic,
like orders of magnitude, less traffic.
It uses less computation in the client, especially
if you have services that do many thousands or even more requests per second. You might have some
multi-core high-performance rep routers which can do hundreds of thousands or more requests.
And sending a UDP packet for every request would actually be quite
prohibitive and the other thing is that if these counter udp packets in the statsd world get lost
you just get a lower total request rate displayed in your monitoring system and you have no clue
that these packets were actually lost with the prometheus, if a scrape fails one time, it doesn't really matter so much because let's say the next scrape works, you will still not lose any of these counter increments that have happened because they are tracked on the service side.
In every instance, these counters are just continuously incrementing from the start of the instance.
And every time I come by, I just see what's the current state.
And that's also a very good argument
for not doing any kind of rate pre-computation
on the service side,
but doing that on the Prometheus server side.
So in your service, really just count things up.
Don't expose rates.
Because let's say if you do expose rates,
there's just kind of a derivative of a counter.
Then you might really, if you miss a scrape,
then you might really miss a peak in a rate.
And if you miss a scrape with a counter,
you just get a bit worse time resolution over that data,
but you would never miss any increments of that counter after Y.
That makes sense.
That certainly makes sense on the theory of why you go which path,
because on one side you can lose data,
on the other side you're just kind of missing some time.
Exactly, yeah.
And that was actually interesting.
The way I fixed this whole stat-study dilemma
before we had Prometheus in SoundCloud was actually quite similar to what Prometheus is doing now.
So I actually put a local StatsD on every host where services were running and services were just sending local UDP packets to those StatsDs.
And then these local StatsDs would pre-aggregate those counters over half you know, half a second or so, and then send that resulting counter to the global stats.
So that's kind of similar.
You're already kind of moving the aggregation to the individual hosts, but you're not having it in the same process.
And Prometheus is even moving that into your process and into your memory space.
And yeah, you don't need to create a network package
just to counter request or something else.
I don't know if that's interesting.
There are other types of metrics
that Prometheus supports besides counters.
So we have gauges.
Maybe I should go into what these are
depending on where you want to go now.
I think it would be awesome to go that much deeper,
but I think we're getting close to our time.
So what I'd like to do is cap there.
Maybe you will write an awesome blog post
and we'll dive deeper into that or something like that,
or maybe we can have you back on at some point.
But I think at this point, let's dive into getting started.
So for those that are going to Prometheus
and thinking like, man, this is really awesome.
I want to check this out.
If you go into the documentation area, there's a getting started.
I think that's actually what the button on the homepage takes you to.
Is that right?
The getting started button.
Yes, it takes you right there.
So if you go to Prometheus.io and you click the button on the homepage, which says get started, you actually
get started, which is kind of nice. But you get this really awesome guide, a Hello World-style
guide that sort of takes you through from zero to running a Prometheus server. So what is it like
to get started, I guess, maybe moving away from other monitoring services? Can you walk through
some of the pains potentially or the process to get started with Prometheus?
Sure.
I think one of the most consistent feedbacks we have gotten about Prometheus
is how easy it is to get started.
So that's actually quite nice.
The reason is that Prometheus is written mostly in Go.
I mean, the server is written completely in Go.
There are client libraries for different languages and so on.
But especially the server being written in Go. There are client libraries for different languages and so on. But especially the server being written in Go and Go producing, you know, statically compiled binaries that you
can just deploy on a machine without having to think about, you know, runtimes or shared libraries
and so on. That makes it very easy to get started and deploy. We have pre-built binaries that you can download for
the major architectures. It's also very easy with our make file to download all dependencies in a
hermetically contained environment to just start building Go from head yourself or from some
release version if you want to. you need to create a configuration file
there's one in the getting started guide here of course that's just one file you point to it and
then by default prometheus will just store all your data in a local directory and yeah and it
will just start scraping data so you can i mean it takes roughly if if you're fast it takes maybe five minutes to get
started and then you have a running prometheus server um of course for that to be interesting
uh you need some example services that that you that you can scrape and so on and there
of course it depends a bit on on what you want to instrument um promdash um is the one exception in the whole ecosystem which is not written in go
it's actually it's actually a rails application um but it's really more of a light back end i
mean the whole rails back end really only stores uh the dashboards as json blobs and could
theoretically pretty easily be replaced by something else uh all the logic is in
the javascript front end and um yeah but but we have docker containers for everything as well like
for all the components so if you really feel like oh i really don't want to set up rails you know
just use the use the prom dash docker container and hopefully that will be less painful um and let's see yeah but i mean that's basically as easy as
this you need you need to download the latest binary unpack it drop in a config file and just
start it and it's running and is by default one of the default configuration files here is set up in such a way that Prometheus collects data
on its own metrics exposition endpoint.
So Prometheus instruments itself
via one of the Prometheus client libraries.
So it can monitor itself, basically.
So that's a nice use case to get started
if you just want to look at
some very simple Prometheus metrics
without having any services.
Another thing that's really nice to get started with is because everyone has this, want to look at some very simple prometheus metrics without having any services another
thing that's really nice to get started with is because everyone has this is the node exporter
which basically by the way has nothing to do with node.js but a host so we um so the node exporter
is a host exporter it exposed it exports host metrics um and that's a really nice thing to get started with
um you just start it you don't i mean you can set a lot of command line flags but if you don't
specify anything by default it will do the right thing and you configure prometheus to scrape that
either statically or via some kind of service discovery and um yeah and then you get host metrics about about either your local machine
or your data center machines and so on that's pretty easy too while we're talking about getting
started i gotta imagine that people are saying okay when i get started i also want to have a
community to sort of hang around so you've got a twitter handle of course you got a mailing list
and you've got rc so those are three ways that people can hang out and sort of catch up.
I was on the mailing list recently and just see that it's pretty lively and
active.
So when you're getting started,
if you have any questions,
then there's this mailing list to look at as well,
which will link up in the show notes,
of course.
And definitely stop by the IRC channel.
So we're there basically every day,
very active.
A lot of people are coming there
asking questions and we're always super happy to answer. And yeah, so that's kind of the fastest
channel to reach us. And the mailing list is good for longer questions and more persistent
communication. So it's Prometheus on Freenode and and then Prometheus Developers as a Google group,
which we'll link out to so you don't have to worry about trying to say that URL.
That's not readable.
That's not pretty.
Which URL?
Well, the Google group.
It's not quite as easy to spread out.
Yeah, you don't want to read that out in a podcast, no.
No, that's boring.
Changelog.com slash 168.
You'll get all the links.
We even found that blog post of Push versus Pull that he referenced,
so we'll have that in there as well.
Or just head over to prometheus.io, click on the Community tab,
and you have all the channels there.
We're very, very happy about any contributors,
and I think who we could especially use,
because we're all backend people,
is someone who really likes doing frontend stuff.
That's traditionally what's always lacking
in these kinds of infrastructure projects.
That's a good segue there, Jared, to the call to arms then.
That's right. Sounds like one.
Julius, if you were going to request help
or give a call to arms to the open source community, would you say front-end developers is what we're after?
What would you say to the open source community, how we can help you out?
Yeah, in general, it would be great to have more front-end interested people in the infrastructure world, right?
And that goes for Prometheus as well.
We've been coding a lot of the, you know, Promdash is very
front-endy and the
graphing interface in Prometheus itself.
But
it would be really great to
get people who feel like really strongly
about infrastructure and
nice front-ends and help
us, you know, refactor a lot of things
there, improve the UI,
make it shiny.
That's definitely always a a nice thing to have but you know any other kinds of contributions are great too i think
two of the areas that are currently still lacking and that will get the most kind of the most
attention in the future are the alert manager which we are currently redesigning and re-implementing over the next
months to be more production ready and more powerful, but also some kind of long-term
storage integration. So we have these ways of writing out data to currently OpenTSDB or InfluxDB,
but it would be really great to have a full read back implementation where you can query
the long-term storage through the prometheus server again and you know if either someone
wants to implement that for an existing back-end system or wants to even maybe create a completely
new prometheus specific long-term storage that would be interesting as well but there's a lot
of stuff to do maybe head to the different issue trackers
on the various Prometheus GitHub projects,
which are all under github.com slash Prometheus.
And check out if there's anything
that looks interesting to you.
So there you have it.
Sounds like lots of different ways to get involved.
And while we're asking our closing questions,
Julius, we would be remiss not to ask
the one everybody loves, which is who is your programming hero?
I hoped you would not ask that one.
Okay.
No.
Bjorn.
Definitely Bjorn.
There you go.
Bjorn is one of my partners in crime on Prometheus.
We're quite a bunch now, actually. Actually, this is funny because
we also hired an intern
right now who we are going to
transform to be a full-timer
at SoundCloud. And we found him
through Prometheus contributions.
And he's very young, like 23,
and he outcodes me
every day. He's very,
very, very smart.
Every day, I'm astounded by the what's
his name fabian and uh yeah i'm every day astounded by the quality and the quantity
of his of his uh coding but also of of his uh communication in the community
the really really great person um i guess more in in terms of traditional programming heroes i guess when i
was a child i really was like had a bit of a coding crush on john carmack um you know with
with the early it games in the 90s doom and so on um definitely in the go community, Rob Pike. And you probably have heard or even met at GopherCon about Dmitry Vyokov.
He's from Google.
He's not really on the Go team, but he's on the Dynamic Tools team of Google.
But he has contributed so many awesome, awesome features to the Go runtime and tooling around that. The race detector, the new tracing framework,
this fuzzing framework that also just now found
actually a bug in Prometheus's query language,
really great.
And a lot of these really hardcore tools
for getting dynamic information about your code.
And he found hundreds of bugs with that.
So I was really impressed when I heard about that.
And he also gave a really great talk about that
at GopherCon that I can highly recommend.
Yeah, that was another one I didn't mention
on top of the show.
Ben Johnson and his open source database stuff.
And then Dimitri, and specifically his talk on Matricon,
like you just said was
one that everybody was kind of raving about as they
came out of the conference room so
you're not the only one who
thinks he's pretty awesome
yep
alright Julius well it was great having you on the show
definitely
something we've been wanting to get you on the show before
to talk about Prometheus and everything
it's doing and what you're all doing at SoundCloud so definitely to get you on the show before to talk about Prometheus and everything it's doing and what you're all doing at SoundCloud.
So definitely fun having you on the show today.
I want to thank our awesome sponsors for the show, CodeShip, TopTile, and DigitalOcean, making this show possible.
I also want to thank our awesome listeners and remind everyone that's not a member yet that we are member supported.
You can join the community and get access to the members only Slack channel as
well as many other awesome benefits of supporting the change.
Well,
I'll go to change law.com slash membership.
And while you're there,
you might as well sign up for change all week.
We can change all nightly,
which is our weekly and nightly emails,
both respectively at slash weekly and slash nightly.
Uh,
Joe,
what's the next week show?
We do have one show scheduled.
What is that next week's show
i ain't putting me on the spot man uh i think it might be ben johnson
i know we got a couple databases i know he's coming up and then
he's on august he records on august 14th so we'll have a show between him and now but we don't know
who it is we don't know who it is.
We don't know who it is.
Okay, we're going to try and tease out what the next show is,
but nonetheless, we have lots of awesome shows coming up soon.
But until then, let's say goodbye.
See ya.
See ya.
Thank you. We'll see you next time.