The Changelog: Software Development, Open Source - Prometheus and service monitoring (Interview)

Starting point is 00:00:00 Welcome back, everyone. This is The Change Log, and I'm your host, Adams Dekowiak. This is episode 168, and we're joined today by Julius Volz from SoundCloud to talk about Prometheus, an open-source service monitoring system written in Go. Super awesome conversation today. I talked about the data model, the query language, and all the in-betweens. We have three awesome sponsors for the show, CodeShip, TopTile, and DigitalOcean. Our first sponsor is CodeShip. They're a hosted continuous delivery service focusing on speed, security, and customizability.

Starting point is 00:00:42 And they've launched a brand new feature called Organizations. Now you can create teams, set permissions for specific team members, and improve collaboration in your continuous delivery workflows. Maintain centralized control over your organization's projects and teams with CodeShip's new organization's plans. You can save 20% off any premium plan you choose for three months by using this code, the ChangeLogPodcast. All right, everybody, we're back. We've got a great show lineup today what we actually been waiting for for a bit it was recommended by peter bergon we just talked about him and go kid and go for carnal that stuff but peter was recommending this jared our last uh guest was saying that this was

Starting point is 00:01:42 their you know prometheus was their tech to play with, so we had to get Julius Volz on the line here. So, Julius, welcome to the show. Hi. Pleasure to be here. And also, we got Jared hanging out in the wings there. Say what's up, Jared. What's up, Jared? So, Jared, we were at Go4Call not long ago, so we met Julius and also Bjorn, who couldn't make this call, but we were excited to finally get a chance to get Prometheus and this conversation talking about metrics tracking and stuff like that on this show. So what's the best way to open this one up? You want to talk about Julius a bit or you want to go right into the tech?

Starting point is 00:02:19 Well, first let me say that we kind of did the hallway track at GopherCon, and we were out interviewing people and talking with everybody. And there was two things people were excited about. One was Ben Johnson, who we lined up to come out here pretty soon, and the stuff that he's been up to. And the other one that everybody was excited about was Prometheus. Yes, true. In fact, I think, Julius, you guys even got a shout-out during one of the keynotes. Is that correct?

Starting point is 00:02:42 Yeah, we got a bunch of shout-outs. I think from Peter's talk, from Tomasz's talk, the keynote. So yeah, really, really exciting. Very cool. So we're excited to hear about it. We want to know all the details, but I think, Adam, maybe if we start with the history,

Starting point is 00:02:56 we can kind of see why Prometheus even exists. Do you want to start there? Let's do that. So Julius, you've been with SoundCloud for a bit. Before that, you were with Google. What was going on to make Prometheus a thing for you? Yeah, so when I was at Google, I was actually doing something completely different. I was in Google's production offline storage system.

Starting point is 00:03:20 So basically, we had many tens of data centers with huge tape libraries backing up all production data that Google had. So basically an exabyte scale backup system globally. So monitoring wasn't really my specialty there, but I definitely came in contact with it as a site reliability engineer on that service. And when I left Google and joined SoundCloud back in 2012, it went as it often goes. When Googlers left Google at around that time especially, they felt a bit naked in terms of what the open source world provided them in terms of infrastructure.

Starting point is 00:04:05 Because at Google you have an awesome cluster scheduler, you've got awesome monitoring systems, awesome storage systems, and so on. Suddenly, you get thrown out into the wild and you miss all of that stuff, and you feel just this urgent need to be building a lot of that yourself again. But when I joined SoundCloud,

Starting point is 00:04:23 a month prior to that, another ex-Googler was also joining SoundCloud a month prior to that another ex-Googler was also joining SoundCloud Matt Proud and he felt even more strongly about this and he was particularly unhappy with the state of open source monitoring systems so he had actually already in his free time

Starting point is 00:04:39 started building client libraries for instrumenting services with metrics. And his grand vision was to build a whole monitoring system. So when I joined a month later, he kind of pulled me on board and we started building something in our free time that eventually became Prometheus. So just in the first months, end of 2012,

Starting point is 00:05:04 that was really just our free time. Finally, we got enough of it working in such a way that we could expose data from services, collect it, query it, and maybe even show it in a graph. And that was the point when we decided, okay, this is actually going somewhere. Let's give this a name. Let's call it Prometheus. And briefly afterwards, we started formally introducing that at SoundCloud. And yeah, nowadays it has become SoundCloud's standard monitoring system and time series database. Now, deep topic aside, I got to ask the question, which is one of my favorite movies out there by ridley scott is a movie called prometheus is there any correlation i have never watched that movie actually well we see aliens come out of the code at some point right so that that was actually funny i think it actually came

Starting point is 00:05:56 out around the same time okay but it wasn't really on my radar back then um i think i just briefly had heard about it but it wasn't really any it wasn't really connected to this okay yeah all right yeah prometheus uh the movie came out in 2012 and i remember loving the name and not loving the movie so much adam so maybe that's a separate show but we could yeah we could go i heard i heard a lot of bad things about that we could pause this for a minute and let me rant. We could just go start another show. I'm just kidding.

Starting point is 00:06:28 Maybe I should go a bit more into what we had at SoundCloud back then, because that was kind of the big motivation to build Prometheus. Well, you said that you felt naked. As a Googler, you felt naked coming out of Google and some of the things missing. So this was obviously one of those things missing. Right, yeah. But you might ask, there were many open source monitoring systems, right?

Starting point is 00:06:50 Why were we not happy with those? We're asking that question. We like that question. I actually had that question queued up for you. Yes, that's the next question. Cool. So, I mean, back then, SoundCloud was doing this migration that a lot of companies do, migrating from one monolithic web application to a set of microservices just because the initial monolithic application has grown too big, too complex.

Starting point is 00:07:14 People don't want to maintain it anymore. You can't have independent groups deploying independent things. So SoundCloud pretty early on actually started adopting Go and built their own kind of Heroku-style in-house cluster scheduler called Bazooka. And that was already a container scheduling system, a very early form. We're still using that actually before Docker came out, beforeubernetes and so on came out and the challenge was now that we had these hundreds of microservices running on these bazooka clusters with thousands of instances and developers whenever they built a new revision maybe every day even scaled down the old revision and scaled up the new revisions and all these instances would land on random hosts and on random ports. And somehow we needed to monitor them.

Starting point is 00:08:09 So what we did back then was, what SoundCloud did back then was use StatsD and Graphite as the main time series based monitoring system. So StatsD and Graphite had several problems. So when I joined, I remember the StatsD server almost falling over because it was a single threaded node application running on a

Starting point is 00:08:34 huge beefy machine, but it could only use one core. So it was actually throwing away UDP packets left and right. I don't know if you know how StatsD works. The general working model is that let's say you have a set of web servers, let's say an API server, and you have 100 instances of that. Then if you want to count the number of HTTP requests that happen in that entire service,

Starting point is 00:09:01 every one of these instances for every request that they handle, send a UDP packet to StatsD and StatsD will count from all these hundred instances, will count up all these counter packets from these different instances over usually a 10 seconds interval and then finally sum them all up and write a single data point out to Graphite at the time. So Graphite is a time series storage system. And StatsD is kind of in front of it to aggregate counter data into a final count per 10 seconds. And you can do some stuff there. Like you can say on the service side, please only send every 10th UDP packet or something.

Starting point is 00:09:47 So you alleviate the load somewhat. But the main pattern is here that you're doing the counting in the StatsD site. And yeah, that StatsD wasn't really scalable. It was throwing away UDP packets, wasn't really working that well anymore. And the other problem was Graphite's inherent data model so in graphite if you store a metric a time series it's only a single metric name with no dimensions so it has some dots in the middle that allow you to separate components of a metric name. And people use that to encode implicit dimensions. So for example, you might have a metric named API.http.get.200 to count the successfully handled get requests of an API server.

Starting point is 00:10:43 And that works, kind of. It doesn't scale too well. Graphite doesn't deal very well with you going wild with these dimensions. It doesn't allow you in the query language to be particularly flexible about how you query for these dimensions, and they're also implicit.

Starting point is 00:11:00 So you look at one of these dot separated components, and you can kind of guess what what it would mean but you only see the value you don't see the key usually another problem there was that due to this limited dimensionality it was really hard to figure out which particular host or which particular service instance a metric was coming from. So let's say you have a global latency spike. So if you have these counters over 100 instances,

Starting point is 00:11:35 they all get counted into one metric in the end, and you don't really see if there's a spike. Was it only in one instance? Was it in all instances? You can't really drill down there anymore some teams have actually then encoded the instance and the port like the host and the port of an instance into the metric name into one of these dot separated components but graphite is not really meant for that and and it blew up pretty quickly so they had to run their own graphite

Starting point is 00:12:06 server but that is not particularly fun because graphite is not so fun to run either. So yeah these were kind of the problems we encountered with the StatsD and graphite combination that was for service monitoring. So when I say monitoring actually I kind of mean I mean different people mean different things with that. I mean, both time series collection and trending and alerting. Some people, when they say monitoring, they think of only something like Nagios, only something that alerts people. See, Jared? What?

Starting point is 00:12:38 Did you hear that? Nagios. Oh, how do you pronounce it? That's the European take on it. I took a break there to butt in, but pre-call, you can mention it, Jared, but he set up some Nagios servers. So anyways, you said Nagios. So Nagios is the way you pronounce it. Well, I don't know how to pronounce it. That's just, you know, I used to be a network administrator back in the day, and I was the only one doing it, so you never say it out loud. But I just thought it was Nagios, because it nags you all the time.

Starting point is 00:13:09 I thought they had a play on words. That makes so much more sense. But Nagios could be right. I don't know. See? That makes sense. But yeah, when I think of alerting, I think of something more like that. But you actually say service monitoring. You include alerting in your definition.

Starting point is 00:13:24 Is that what you're saying? include i include time series collection i include uh the graphing i include the alerting so the whole complex of getting metrics from your systems and and acting on it and notifying someone okay um it's it's kind of just a question of definition i guess sure um yeah so so we used Nagios, Nagios back then. Partially just running completely these stateless checks that you run on a host to see if things are good right now. And partially based on graphite based time series. And yeah, that was fine. But Nagios is kind of also from the nineties. It's,. Its data model is very limited. I mean, it knows about hosts and services on those hosts.

Starting point is 00:14:09 And if you have something like a cluster-wide check or things that just don't fit into that pattern, you kind of have to squeeze them into that pattern. And that sometimes works, sometimes not that great. It's really hard to silence by any arbitrary dimensions in Nagios. So yeah, the data model there is also a bit painful. The UI, I think we don't even need to talk about. Nowadays, we're actually using Icinga, which has a bit better UI.

Starting point is 00:14:36 What's it called? Icinga is basically a drop-in replacement for Nagios. So it uses the same database. I don't think you have to change much. It's just kind of a new UI. And I think it has a bit of a different, more scalable mechanism for executing checks. But I'm not really an expert in that area.

Starting point is 00:14:58 Yeah, so that was for service monitoring. And for host monitoring, we had Ganglia. And Ganglia is pretty much completely you know you have the host as a dimensional key there but not much else of course also the metric name but there's no query language there's no nice graphing interface and so on you get these pretty static dashboards with host metrics and um yeah so we used also Nagios, of course, for the host alerting then. This might be a little bit premature, but I just went to the Nagios,

Starting point is 00:15:32 and we're all going to say it different ways, by the way. Nagios, Nagios, Nagios. They say they're the industry standard for IT infrastructure monitoring. What is the goal or what was the goal with Prometheus? Was it to redo what everyone had been doing prometheus was it to you know redo what everyone had been doing not quite so well because you have opinions and you know obviously some some skills to do it but was it is it the goal to sort of unseat some of these existing players or is it

Starting point is 00:15:59 to just sort of like rebuild something new that that made sense for soundcloud uh yeah definitely so uh for us it was the goal to replace statsd to replace graphite to replace nagios in the end with a new kind of ecosystem that is uh more powerful and more integrated and allows you to do more stuff in a more modern way so yeah definitely we we hope to make people depend less on those old tools, I would say. So we kind of sometimes jokingly call it a next generation monitoring system. And it does try to cover all the aspects from instrumenting your services, collecting the data, showing the data in the dashboard, alerting on the data if something is wrong,

Starting point is 00:16:42 and then sending those notifications to you. So yeah, it tries to cover basically the whole field. What it does not do is event-based monitoring. So if you want to do per-request accounting, let's say you want to really collect every individual event, a use case like logging or a use case like elastic search where you can really put every individual record of what happened in there that's not really what we're trying to do prometheus is really in the business of collecting

Starting point is 00:17:16 purely numeric time series that have a metric name and a set of key value dimensions and those the metric name and the key value dimensions uh uniquely identify every series and that you can then actually use together with a query language to do really powerful queries to aggregate and slice and dice based on whatever dimension you're currently interested in during the query actually and yeah so you started building this in your free time or your you and your buddy started building it i'm curious just kind of the the inner workings of soundcloud where they're at with open source and how much freedom they give you as an engineer was this something that you had to sell to your boss or to the company or was it just like well we're doing this now and whatever you guys think is the best solution

Starting point is 00:18:06 must be right? Yeah, so this was definitely an interesting history. I think at the beginning, we just took the liberty ourselves to do that in our free time. There was a lot of resistance at the beginning to introduce that at SoundCloud, which totally makes sense to me,

Starting point is 00:18:21 especially in retrospect, because to be honest at the beginning nothing was really working it was i mean late 2012 early 2013 um the the main server was pretty immature it wasn't really performing well there a lot of ecosystem components were missing um and there was no real dashboarding solution yet and so on but like as time went on we just kind of you know we i think we we took we took quite some liberties there in just pushing this project on and it became better and better and i would say like probably one and a half years in we had the main server that collects the time series and makes them curable.

Starting point is 00:19:05 We had that pretty mature and stable. We had Promdash, which is the Prometheus dashboard builder. So finally, people were actually able to build dashboards on top of the data that they collected. And we also had one of our really first killer use cases where we got instrumentation about all the containers that were running on Bazooka or in-house Heroku system. So you could get for every application revision and proc type keyed by those dimensions and more actually the current CPU usage, the memory usage, the memory limit, and so on and so on. And that really started convincing people that this was really worth it. And then I think that was kind of the tipping point where shortly after the strategic bet was made in SoundCloud to really switch to that. And in terms of open sourcing, that was interesting because when we started this initially,

Starting point is 00:20:03 we just put it up on github without asking anyone on its own organization and um so it's kind of a weird status i guess um it was a private project it's still arguably i mean it was definitely started in in the free time matt even started before he joined soundCloud. And we've been trying since then to keep it as independent as possible from any single company. So we really want this to be an open community project without one company controlling

Starting point is 00:20:36 too much of the direction and so on. And before, so we put it on GitHub back then, but we really didn't make any noise about it. So we only told a couple of friends, especially also other ex-Googlers. So I guess I have to say Prometheus is kind of inspired by a lot of what we learned about monitoring at Google. And a lot of people who quit Google then either asked us, hey, do you know anything similar? Or they just discovered prometheus and kind of noticed that it was very similar to what they've been used to so before we even you

Starting point is 00:21:11 know went more public about prometheus we had a kind of an insider circle of people using it testing it already at one of our ex-colleagues from soundcloud who then went to docker he started using it at Docker. And another colleague used it at BoxEver, which is a Dublin-based company. And so he's in Dublin. And in terms of open sourcing, so it was open source, but only in the beginning of this year, for the record, since it's a podcast, this year is 2015.

Starting point is 00:21:43 In January, we decided, okay, it's finally, it's really ready this year is 2015 uh in january we decided okay it's finally it's really ready enough to share with a broader audience so um just leading up to that we had a lot of discussions with you know internal departments about how we should communicate this and what's the legal status around that in the end everything was pretty relaxed and we had uh you know blog posts on the soundcloud backstage blog and on box everest blog and i think on my docker colleagues uh private blog back then and yeah and and then it really took off and um that was so it took some work then though it took some commitment from you and matt and others that were sort of seeing the light of where this can go. I was going to say, did you just run it concurrently alongside your StatsD stuff until it showed its value?

Starting point is 00:22:31 And then you were able to eventually cut over? Or are you still running your StatsD stuff as well? So, yeah, that's what we did. And StatsD is still running because, you know, you never turn off old systems in practice. But practically nobody is using that anymore very few people are using that so if you're building a new service at soundcloud it's it's going to use prometheus um there's some uh legacy stuff on statsd and graphite still and there's some stuff that was hard to convert but uh yeah for the most part, it's all Prometheus now. And yeah, it's been really a ride,

Starting point is 00:23:07 especially since being more vocal about it beginning of the year. We've really, I mean, the community has grown crazily. We have contributors from all kinds of companies. We get a lot of contributions. Basically, we get contributions almost every day if not multiple i think google is now google's kubernetes is now natively instrumented with prometheus metrics so if you want to monitor kubernetes you don't even need to have kind of any kind of adapter to get

Starting point is 00:23:40 prometheus metrics out of there you have coreOS adopting it quite a lot for their components. So etcd is one notable mention there that is already sprinkled with Prometheus metrics. Then you have DigitalOcean completely adopting it for their internal monitoring right now. I don't know how much I can say about that, but I think these are the three companies where they're like reasonably public about what they're doing with Prometheus.

Starting point is 00:24:08 I know of a bunch more, but I'm not sure how much I can say about those. Sure. Well, there's definitely tons of details. Any system that looks to replace a handful of legacy systems will have many moving parts. And you have an architecture, you have a data model, there's a query language query language there's lots of details we want to ask you about all of them first we're going to take a quick sponsor break uh hear a word from our awesome sponsor and then we will be back with all the nitty-gritty details of prometheus we are back top tile is by far the best place to work as a freelance software developer i had a chance to sit down and talk with Brendan Beneshot, the co-founder and COO of TopTile. And I asked Brendan to share some details about the foundation of TopTile, what makes TopTile different, and what makes their network of elite engineers so strong.

Starting point is 00:24:59 Take a listen. I mean, I'm one of the co-founders, and I'm an engineer. I studied chemical engineering, and to pay for this super expensive degree, I was freelancing as a software developer. And by the time I finished, realized that being a software developer was pretty awesome. And so I kept doing that. And my co-founder is in a similar situation as well. And so we wanted to solve a problem as engineers and do it from as a network of engineers,

Starting point is 00:25:26 kind of for engineers by engineers. And having that perspective and consistently bringing on new team members who also share this really makes TopTel different in that it's a network of engineers, not kind of like you have TopTel and then the developers. It's never about us and them. It's always us. Like everybody at TopTel, for the most part, refers to TopTel as their company, and they feel like it's their company, and everybody acts like a core team member,

Starting point is 00:25:53 even though they're freelancers within the TopTel network. And all of these things are extremely important to us. All right, if you're interested in learning more about what TopTel is all about, head to toptel.com slash developers. That's T-O-P-T-A-L dot com slash developers to learn more. And make sure you tell them the change I'll send you. All right, we are back talking to Julius Bowles about Prometheus, the data monitoring system out of, well, kind of out of SoundCloud.

Starting point is 00:26:25 Maintained by some SoundCloud people, used by SoundCloud and others, and really making a name for itself in the industry. Julius, we want to talk to you about the details of Prometheus. You talked about some of the problems that you guys have run up against in different systems, and you obviously look to solve those problems with Prometheus. So maybe take us through the high level points and we'll dig down as we find them interesting, starting with the architecture. I know it's kind of hard without visualizations, but if you could lay it out in words, what are all the moving parts and how do they fit together?

Starting point is 00:27:00 Sure. I actually have the advantage that I have the architecture diagram in front of me. But if you as a podcast listener also want to view it, head over to prometheus.io and scroll down in the overview section. So I think the heart of Prometheus is the Prometheus server, which is really you run one or multiple of those in your company and you configure it to scrape targets. So basically services that you're interested in. Prometheus is kind of believes in the church of pull. That means it pulls data rather than having data sent to it. And actually we should really go into why we decided to do that because that's an

Starting point is 00:27:45 interesting religious kind of point um but let's do that later maybe um so you configure that server to scrape your services and these services are can can be one of three different things so it could either be your own service that you can instrument with one of our client libraries and the client libraries allow you to expose things such as countermetrics gauges histograms and summaries the latter two are kind of hybrid metric types that give you either you know like bucketed histograms or quantiles and so so the client libraries give you programming language objects that allow you to track counter state and so on

Starting point is 00:28:30 and then also expose it over HTTP and Prometheus server, the Prometheus server then comes by regularly, usually every 15, 30 or one minute or whatever you configure and scrapes that endpoint, gets only the current state of the metric. So there's no history in the client.

Starting point is 00:28:48 It only gets the current state. So let's say for a counter, it would just get how many requests have happened since this service instance started. And the counter's never reset. So you could have two totally independent Prometheus servers scraping the same target and getting the identical data and so prometheus does that stores these metrics

Starting point is 00:29:12 locally in a local storage i should say that currently we only really for for the querying we only really have a local on-disk storage. Our goal was to have single server nodes which are completely independent of any other thing on the network. When things really go awry and you need to figure out what's going on during an outage, you really can go to that one server and look at your metrics without having to depend on complex distributed backend storage and so on. We do have support for writing to, experimental support for writing to OpenTSDB and InfluxDB at the moment, but it's not possible yet to read back from those through Prometheus via Prometheus' query language.

Starting point is 00:30:06 So if you want to get data out of those again, currently you would still have to then head to those other systems. But that's on the long-term roadmap. We definitely want to have a long-term storage that we can read back from. The local storage is good for a couple of weeks or maybe even months, maybe longer, depending on how much data you have. But it's not really meant as a forever storage. That's just a simplicity decision just because you guys want it to be simple.

Starting point is 00:30:35 Yeah, on one hand, it's much simpler to implement, of course, than a distributed system. And we also believe that through the simplicity, hopefully, you'll get more reliability out of this in the end. So if, let's say, you wanted to have HA, high availability, you would simply run two identically configured Prometheus servers scraping exactly the same data. And if one goes down, you still have the other one to go to. But they're not clustered. So they're completely independent of each other. And if you want to investigate state during an outage, you just need one of them up. And you can go to either one and see what's actually happening.

Starting point is 00:31:16 Okay, so normally instrumented jobs are one of the three types of things that Prometheus can collect data from. But you might also have something like a Linux host machine or HAProxy or Nginx, things that you cannot easily at least instrument directly. You probably wouldn't want to go into the Linux kernel and build a module that exports Prometheus metrics over HTTP, right? So for that, we have a set of export servers, we call them exporters, which are just basically little jobs,

Starting point is 00:31:52 little binaries that you run close to whatever you're interested in monitoring. And they know how to extract the native metrics from that system. So for example, in the case of the host exporter, it would go to the proc file system and give you a lot of information about the networking and the disks and so on and so on.

Starting point is 00:32:14 And these little exporters then transform what they collect locally into a set of Prometheus metrics, which they again expose on an HTTP endpoint for Prometheus to scrape. And that's how Prometheus metrics, which they again expose on an HTTP endpoint for Prometheus to scrape. And that's how Prometheus can get information from these kinds of systems. And we have a lot of exporters for all kinds of systems there already.

Starting point is 00:32:35 Finally, the third kind of thing you might want to monitor and which can be a challenge is things like batch jobs or things that are just too short-lived to be exposing metrics and to be scraped reliably by Prometheus. So in that case, let's say you have a daily batch job which deletes some users or so on, and you want to track the last time it ran successfully and how many users it deleted.

Starting point is 00:33:02 For that, we have something called the push gateway, which is kind of the glue between the push and the pull world, which you're only really supposed to be using when you really have to. And the batch job could then push at the end of its run, usually these metrics, the last run and the deleted users, to that push gateway. And the push gateway would simply hold on

Starting point is 00:33:23 to those metrics forever. And the push gateway would simply hold on to those metrics forever. And the Prometheus server can then come by and scrape it from the push gateway. And yeah, so that's kind of the data ingestion side of things. In the architecture further there, so after the data is collected and stored, we can do two interesting things with the data. We can look at it as a human on the dashboard

Starting point is 00:33:49 or directly on the Prometheus server. So for dashboarding, we have a couple of solutions. We have Promdash, the Prometheus dashboard builder. It's really kind of a UI-based, click-based dashboard builder, similar to Grafana. When I started building Promdash, Grafana, to my knowledge, didn't really exist yet or not at all. But it's roughly comparable to that. But since then, Grafana now directly from the Prometheus server.

Starting point is 00:34:31 That's kind of a power user use case where you can build any kinds of HTML based dashboards. And these templates then have access to the query language of Prometheus. So they allow you to build even dynamic layouts depending on the data that you have in your Prometheus instance. So that's visualization. And then the last part that we do in Prometheus is alerting. So you have collected a lot of data now

Starting point is 00:35:04 about all your systems, your hosts, and your services. And now you can actually a lot of data now about all your systems your hosts and your services and now you can actually make use of that data to see if something is wrong somewhere to see if a batch job hasn't run for a while to see if the request rate of some services are too low or errors are spiking up and you can actually use the same powerful query language that you can use to display stuff. You can use the same language to formulate alert conditions under which people should get notified. And since you might have multiple of these Prometheus servers that each compute these alert conditions in the company, you might want to do some correlation between them and alert routing and so on.

Starting point is 00:35:47 And that's better done in a central place. So you'll usually have one or a few alert managers in your company. That's a separate binary again that you run usually once. That all the Prometheus's in your organization send currently firing alerts to. And the alert manager then can do things like inhibit one alert if another one is firing.

Starting point is 00:36:12 It knows how to route alerts based on the key value dimensions on the alerts to specific notification configurations, to specific teams and so on. And it supports a range of notification mechanisms like pager duty, email, Slack, and so on. So that's kind of the overall overview over Prometheus. Just one question on the visualization side. What's the purpose of having a separate, like the prom dash aspect and then also built-in

Starting point is 00:36:44 graphing and querying. Is one for a certain use case and one for a different use case? Yeah, definitely. So the built-in graphing is really more useful for ad hoc exploration, really off data that is in one Prometheus server. And that's good, you know,

Starting point is 00:37:01 even if your prom dash is down and you really just want to see what's happening in one Prometheus server, you can go there. You can do very rudimentary graphing so it doesn't have all the bells and whistles that PromDash has, you know, like stacked. It does have stacked graphs, but it doesn't have like multiple axes,

Starting point is 00:37:18 multiple expressions in one graph, different color schemes and things like that. So it's quite simple, but it allows you in the worst case to still explore the data in that Prometheus server. And Promdash is really a dashboard builder. So that's for when you really want to persist a dashboard forever and for other people to see and to share.

Starting point is 00:37:40 And especially it's very useful, let's say, I think in SoundCloud we have maybe roughly 50 Prometheus servers. And we have one central promdash installation, which just knows about all these Prometheus servers. And in there you can then have dashboards or even single graphs where you show time series or query expressions from multiple different servers in one graph. So yeah, it's more of this nice wall dashboard use case. Yeah, so the alert management would be part of the built-in UI. The configuration of your alerts and stuff would be what you'd use the built-in UI for?

Starting point is 00:38:21 Yeah. So for alerting, that's actually part of that that's partially in the prometheus server and partially in the alert manager okay um so in the prometheus server you can define rules um basically rules that alerting rules that get executed let's say every 30 seconds or one minute commonly uh depending on what you configure and what happens there is that it really just executes a query expression and sees if there are any results from that expression. We maybe should go a bit into how the query language works. And if there are any results from that expression,

Starting point is 00:39:00 they get transformed into labeled alerts and get transferred to the alert manager where they can then be duped, silenced, rooted, and so on. And this is kind of interesting because this whole labeled key value data model goes all the way from the instrumented services to the storage, to the querying, and all the way to the alert manager.

Starting point is 00:39:23 So you really have that chain of dimensional information to work with at every point in the chain. Yeah, it sounds like everything builds off the query language and the query language builds off of the data model. Exactly. So maybe the data model is probably the next place to dig in and tell us what it is, how it all works, and maybe if that's unique to Prometheus

Starting point is 00:39:46 or something you took from somewhere else. Just go into the details on how the actual data is modeled. Sure. So Prometheus stores time series. And time series have a metric name, and they have a set of key value dimensions, which we just call labels. So you might have something like a metric name, HTTP requests total, which tracks the total number of HTTP requests that have been handled by a certain service instance since it started.

Starting point is 00:40:19 But then you might be interested in drilling down right you would want to know which of these are get requests which which path handlers have been hit and so on and for that you can use the label dimensions so for example you might have method equals get on there and you might have status status equals 200 for the outcome and so on and these dimensions then get stored and they allow you to query time series by these dimensions so you could say you know some over all the dimensions except the status code dimension then you would get the total number of requests over all your service instances but keyed by by the status code so that dimension would be preserved or you could just select a specific dimension or you can even do so let's say you have one metric and you have all these kind of sub-dimensional instantiations of that metric. You know, one for method equals get, one for method equals put,

Starting point is 00:41:32 and then under these you have, you know, the other labeled dimensions. So for one metric name, you actually get a lot of time series with all these different label sets. And now if I just query for just the metric name, I get all these time series back. If I don't filter, if I don't aggregate and so on. And that can be very useful. So let's say on Bazooka, we have a use case where we have one set of these time series just describing for every instance running on Bazooka. What is the memory limit?

Starting point is 00:42:06 How much memory can it use before the cluster manager kills it, right? And we have another metric called basically the current memory usage. And if we just have these two metric names, we can actually, in the query language, just put a minus in between them to subtract the current

Starting point is 00:42:25 usage from the limit to get kind of the headroom you know the the memory that they can use still use before they would get killed if we wanted to know like how well do instances utilize their memory and what would actually happen if we just put a minus between these two metric names is that not only a single number there's not only a single number on the left or a single number on the right, but you have these whole, let's say, vectors of time series on each side of this binary obturation, and they get matched on identical label sets.

Starting point is 00:43:00 So the usage of one instance is matched with the limit of another instance and so on and so on and in the end as the output of the expression you get again the current headroom per instance with all the dimensional labels still preserved and you know you can do go more fancy than that you don't need to have an exact match there there's like several language constructs that allow you to do one to n or n to two one matches and so on and and specify how exactly to match things but this kind of vector-based matching algebra i think is quite unique to prometheus at least in the open source world yeah so the you give it a name and then a series of labels.

Starting point is 00:43:46 And it sounds like the labels, that's what you refer to as the multidimensional aspect because each label you add adds a dimension to that particular time series. And then your guys' built-in querying for that construct is really where it sounds like the flexibility is coming from. Am I following you? Yep, that's totally correct. And maybe one word of warning for the labels they're really meant to be

Starting point is 00:44:10 kind of dimensions but they're not meant to be of arbitrary cardinality so let's say if you wanted to store a user id of a service with millions of users, you probably would not want to use a label value for that because you would suddenly get millions of time series for this one metric. So you really have to be aware of that. Every combination of labels on a metric creates one new time series automatically. And these time series are indexed and so on

Starting point is 00:44:44 and they need to be managed um so if you really want to have that kind of uh highly arbitrarily uh high cardinality dimensional insight like storing email addresses or storing user ids and so on or the content of my sql queries, the actual query string, then you're probably better served with something like a log-based system, InfluxDB or Elasticsearch and so on, that really can store individual events, individual things with arbitrary metadata. So I can see where the labels might get a little bit where there's better and worst

Starting point is 00:45:27 practices with them whereas you know with a more just a key value namespacing thing it's it's pretty easy to just come up with the next name you drill down one dimension but as you add dimensions i can see where it get difficult and you're in fact warning against things not to do is there a place to go where it's like hey what how how would i do this in a typical situation because i think across many organizations the type of metrics are similar do you guys have best practices or things you've learned at soundcloud best ways to use prometheus labels oh yeah definitely um so we actually have a whole section on best practices uh at the very bottom of our website about metric and label naming and how to build good consoles, dashboards,

Starting point is 00:46:07 and alerting and so on. I think, yeah, one thing that really just happened sometimes at SoundCloud is that people mistakenly, either by not yet knowing the Prometheus data model well enough or just by making a simple mistake in the code, have set some of these label dimensions, let's say, to a track ID or a user ID. And that then creates millions and millions of time series.

Starting point is 00:46:32 I mean, a single Prometheus server can handle millions of time series. But if you just overdo it a bit and you're not careful about what you stick into label values, then you can really easily blow up a Prometheus server. So keep those label dimensions to sane, bounded things. So you always have Prometheus automatically attaches some of them anyways. So you get the name of the job, which is kind of the name of the service.

Starting point is 00:47:01 It's just terminology, I guess. The name of the service, which we call job, the host and port of the instance by default. And that already gives you some dimensionality, even if you don't have any labels on the side of your service, right? So you at least get, if you have 100 instances, you get 100 times series for this one metric, which could be the

Starting point is 00:47:25 number of HTTP requests. And then you have to multiply that by all the other dimensions that you add. And that can easily end up for a single metric, you can easily get, you know, thousands or even 10,000 of time series. Well, certainly lots of moving parts when we talk about Prometheus. So I'm going to assume that based on this conversation, so many people are like, I want to try it out. I want to get started. So we're going to take a quick break. And when we come back, we're going to talk about just that. We'll be right back. I have yet to meet a single person who doesn't love DigitalOcean.

Starting point is 00:48:02 If you've tried DigitalOcean, you know how awesome it is. And here at the Changelog, everything we have runs on blazing fast SSD cloud servers from DigitalOcean. And I want you to use the code CHANGELOG when you sign up today to get a free month. Run a server with 1GB of RAM and 30GB of SSD drive space totally for free on DigitalOcean. Use the code CHANGELOG. Again again that code is changelog use that when you sign up for a new account head to digitalocean.com to sign up and tell them the changelog sent you all right we're back with uh julius voles talking about prometheus uh and while we were on that break, we realized that getting started is a good step to go towards next.

Starting point is 00:48:47 But we forgot. We want to kind of go back a little bit on this religious piece of push versus pull when it comes to Prometheus. So Julius, why don't you lead us through that piece there? Sure. So this is funny because it's a bit of a religious thing. And push can be, you know, pull can be sometimes better sometimes push is better depending on the type of environment you're using the prometheus in but one of our team members even wrote a blog post about push versus pull for monitoring he's he's brian from from dublin

Starting point is 00:49:18 and you can find that in our faq actually but I think some points are interesting. So if you do, so I think first let's start with one advantage of push. Push is really easy to get through firewalls if your monitoring system is easily reachable from everywhere. You only need to make one point, one network point available on the internet

Starting point is 00:49:40 or in your local, in your company's network or whatever. And then everyone just needs to be able to push somehow to that. With pull, sometimes people run into the problem that let's say, you know, if they have setups where they need to pull from various endpoints on the internet and they should be secured and so on,

Starting point is 00:50:03 you know, they have to have a bit more, they need to now secure and make available N endpoints instead of one. So that's often what pains people when they can't use pull. But for us, especially in these kind of modern web company environments where you have your own data centers

Starting point is 00:50:24 or your own virtual private clouds and you have internally trusted environments where you can just pull from every target pull really has a number of advantages so one thing that's really really nice is that you can just manually go per http to a target and get the current state of the target. So by default, if you go to a Prometheus endpoint on a service, you will get a text-based format that will tell you the current state of all the metrics. And you don't even need a server for that. So that's one nice thing.

Starting point is 00:51:00 You can run a complete copy of production monitoring on your laptop or anywhere. You can just bring up a second copy of all of it to do experiments, to try out new alerting rules and so on. And that copy will get the exact same data as your production version of monitoring without you having to configure the actual services to send data somewhere else. And we kind of argued that if you're doing service monitoring and alerting, you kind

Starting point is 00:51:30 of need to know, your monitoring system kind of needs to know anyways where your services live and which services should currently be there. Because otherwise it can't really alert you about a target being down or so on because it doesn't know if it should be gone if it was deprovisioned or you know if it is just crash looping for example so with that kind of argument the monitoring system should be knowing what your targets are anyways so the knowledge is already there so that also makes it easier to pull the data and makes it easier to tell in monitoring and alerting whether a target is currently down.

Starting point is 00:52:12 And yeah, so we don't think otherwise that it's like a huge issue, whether you do push or pull, especially in terms of scalability, it doesn't really matter that much. But yeah, it kind of depends on your environment. I think there would be some scalability aspects of pulling as you had more services, more hosts. I guess

Starting point is 00:52:35 you had your StatsD servers drop in UDP packets. It seems like catching UDP packets is a lot easier than going out and requesting data. Have you found in practice that that's just not a big issue? Yeah, so that's actually an interesting point. So that's really not an issue at all. So the actual pulling side of things has never been a bottleneck for us. But it's also very important to point out here that the whole fundamental way of how data is transferred is quite different in the StatsD model to the Prometheus model.

Starting point is 00:53:07 As I said earlier, in the StatsD model, you send UDP packets basically proportionally to the amount of user traffic you get, right? Like for every HTTP request or every 10th or so on, you send a UDP packet, please count this, please count this, please count this. Why don't you just increment a number in memory on your web server?

Starting point is 00:53:29 And then every 15 seconds or so, transfer the current counter state. So that's Prometheus' philosophy. The nice thing is there, it uses way less traffic, like orders of magnitude, less traffic. It uses less computation in the client, especially if you have services that do many thousands or even more requests per second. You might have some multi-core high-performance rep routers which can do hundreds of thousands or more requests.

Starting point is 00:54:01 And sending a UDP packet for every request would actually be quite prohibitive and the other thing is that if these counter udp packets in the statsd world get lost you just get a lower total request rate displayed in your monitoring system and you have no clue that these packets were actually lost with the prometheus, if a scrape fails one time, it doesn't really matter so much because let's say the next scrape works, you will still not lose any of these counter increments that have happened because they are tracked on the service side. In every instance, these counters are just continuously incrementing from the start of the instance. And every time I come by, I just see what's the current state. And that's also a very good argument for not doing any kind of rate pre-computation

Starting point is 00:54:54 on the service side, but doing that on the Prometheus server side. So in your service, really just count things up. Don't expose rates. Because let's say if you do expose rates, there's just kind of a derivative of a counter. Then you might really, if you miss a scrape, then you might really miss a peak in a rate.

Starting point is 00:55:16 And if you miss a scrape with a counter, you just get a bit worse time resolution over that data, but you would never miss any increments of that counter after Y. That makes sense. That certainly makes sense on the theory of why you go which path, because on one side you can lose data, on the other side you're just kind of missing some time. Exactly, yeah.

Starting point is 00:55:38 And that was actually interesting. The way I fixed this whole stat-study dilemma before we had Prometheus in SoundCloud was actually quite similar to what Prometheus is doing now. So I actually put a local StatsD on every host where services were running and services were just sending local UDP packets to those StatsDs. And then these local StatsDs would pre-aggregate those counters over half you know, half a second or so, and then send that resulting counter to the global stats. So that's kind of similar. You're already kind of moving the aggregation to the individual hosts, but you're not having it in the same process. And Prometheus is even moving that into your process and into your memory space.

Starting point is 00:56:22 And yeah, you don't need to create a network package just to counter request or something else. I don't know if that's interesting. There are other types of metrics that Prometheus supports besides counters. So we have gauges. Maybe I should go into what these are depending on where you want to go now.

Starting point is 00:56:43 I think it would be awesome to go that much deeper, but I think we're getting close to our time. So what I'd like to do is cap there. Maybe you will write an awesome blog post and we'll dive deeper into that or something like that, or maybe we can have you back on at some point. But I think at this point, let's dive into getting started. So for those that are going to Prometheus

Starting point is 00:57:04 and thinking like, man, this is really awesome. I want to check this out. If you go into the documentation area, there's a getting started. I think that's actually what the button on the homepage takes you to. Is that right? The getting started button. Yes, it takes you right there. So if you go to Prometheus.io and you click the button on the homepage, which says get started, you actually

Starting point is 00:57:26 get started, which is kind of nice. But you get this really awesome guide, a Hello World-style guide that sort of takes you through from zero to running a Prometheus server. So what is it like to get started, I guess, maybe moving away from other monitoring services? Can you walk through some of the pains potentially or the process to get started with Prometheus? Sure. I think one of the most consistent feedbacks we have gotten about Prometheus is how easy it is to get started. So that's actually quite nice.

Starting point is 00:57:55 The reason is that Prometheus is written mostly in Go. I mean, the server is written completely in Go. There are client libraries for different languages and so on. But especially the server being written in Go. There are client libraries for different languages and so on. But especially the server being written in Go and Go producing, you know, statically compiled binaries that you can just deploy on a machine without having to think about, you know, runtimes or shared libraries and so on. That makes it very easy to get started and deploy. We have pre-built binaries that you can download for the major architectures. It's also very easy with our make file to download all dependencies in a hermetically contained environment to just start building Go from head yourself or from some

Starting point is 00:58:41 release version if you want to. you need to create a configuration file there's one in the getting started guide here of course that's just one file you point to it and then by default prometheus will just store all your data in a local directory and yeah and it will just start scraping data so you can i mean it takes roughly if if you're fast it takes maybe five minutes to get started and then you have a running prometheus server um of course for that to be interesting uh you need some example services that that you that you can scrape and so on and there of course it depends a bit on on what you want to instrument um promdash um is the one exception in the whole ecosystem which is not written in go it's actually it's actually a rails application um but it's really more of a light back end i

Starting point is 00:59:35 mean the whole rails back end really only stores uh the dashboards as json blobs and could theoretically pretty easily be replaced by something else uh all the logic is in the javascript front end and um yeah but but we have docker containers for everything as well like for all the components so if you really feel like oh i really don't want to set up rails you know just use the use the prom dash docker container and hopefully that will be less painful um and let's see yeah but i mean that's basically as easy as this you need you need to download the latest binary unpack it drop in a config file and just start it and it's running and is by default one of the default configuration files here is set up in such a way that Prometheus collects data on its own metrics exposition endpoint.

Starting point is 01:00:28 So Prometheus instruments itself via one of the Prometheus client libraries. So it can monitor itself, basically. So that's a nice use case to get started if you just want to look at some very simple Prometheus metrics without having any services. Another thing that's really nice to get started with is because everyone has this, want to look at some very simple prometheus metrics without having any services another

Starting point is 01:00:45 thing that's really nice to get started with is because everyone has this is the node exporter which basically by the way has nothing to do with node.js but a host so we um so the node exporter is a host exporter it exposed it exports host metrics um and that's a really nice thing to get started with um you just start it you don't i mean you can set a lot of command line flags but if you don't specify anything by default it will do the right thing and you configure prometheus to scrape that either statically or via some kind of service discovery and um yeah and then you get host metrics about about either your local machine or your data center machines and so on that's pretty easy too while we're talking about getting started i gotta imagine that people are saying okay when i get started i also want to have a

Starting point is 01:01:35 community to sort of hang around so you've got a twitter handle of course you got a mailing list and you've got rc so those are three ways that people can hang out and sort of catch up. I was on the mailing list recently and just see that it's pretty lively and active. So when you're getting started, if you have any questions, then there's this mailing list to look at as well, which will link up in the show notes,

Starting point is 01:01:57 of course. And definitely stop by the IRC channel. So we're there basically every day, very active. A lot of people are coming there asking questions and we're always super happy to answer. And yeah, so that's kind of the fastest channel to reach us. And the mailing list is good for longer questions and more persistent communication. So it's Prometheus on Freenode and and then Prometheus Developers as a Google group,

Starting point is 01:02:26 which we'll link out to so you don't have to worry about trying to say that URL. That's not readable. That's not pretty. Which URL? Well, the Google group. It's not quite as easy to spread out. Yeah, you don't want to read that out in a podcast, no. No, that's boring.

Starting point is 01:02:42 Changelog.com slash 168. You'll get all the links. We even found that blog post of Push versus Pull that he referenced, so we'll have that in there as well. Or just head over to prometheus.io, click on the Community tab, and you have all the channels there. We're very, very happy about any contributors, and I think who we could especially use,

Starting point is 01:03:06 because we're all backend people, is someone who really likes doing frontend stuff. That's traditionally what's always lacking in these kinds of infrastructure projects. That's a good segue there, Jared, to the call to arms then. That's right. Sounds like one. Julius, if you were going to request help or give a call to arms to the open source community, would you say front-end developers is what we're after?

Starting point is 01:03:29 What would you say to the open source community, how we can help you out? Yeah, in general, it would be great to have more front-end interested people in the infrastructure world, right? And that goes for Prometheus as well. We've been coding a lot of the, you know, Promdash is very front-endy and the graphing interface in Prometheus itself. But it would be really great to

Starting point is 01:03:53 get people who feel like really strongly about infrastructure and nice front-ends and help us, you know, refactor a lot of things there, improve the UI, make it shiny. That's definitely always a a nice thing to have but you know any other kinds of contributions are great too i think two of the areas that are currently still lacking and that will get the most kind of the most

Starting point is 01:04:18 attention in the future are the alert manager which we are currently redesigning and re-implementing over the next months to be more production ready and more powerful, but also some kind of long-term storage integration. So we have these ways of writing out data to currently OpenTSDB or InfluxDB, but it would be really great to have a full read back implementation where you can query the long-term storage through the prometheus server again and you know if either someone wants to implement that for an existing back-end system or wants to even maybe create a completely new prometheus specific long-term storage that would be interesting as well but there's a lot of stuff to do maybe head to the different issue trackers

Starting point is 01:05:06 on the various Prometheus GitHub projects, which are all under github.com slash Prometheus. And check out if there's anything that looks interesting to you. So there you have it. Sounds like lots of different ways to get involved. And while we're asking our closing questions, Julius, we would be remiss not to ask

Starting point is 01:05:24 the one everybody loves, which is who is your programming hero? I hoped you would not ask that one. Okay. No. Bjorn. Definitely Bjorn. There you go. Bjorn is one of my partners in crime on Prometheus.

Starting point is 01:05:41 We're quite a bunch now, actually. Actually, this is funny because we also hired an intern right now who we are going to transform to be a full-timer at SoundCloud. And we found him through Prometheus contributions. And he's very young, like 23, and he outcodes me

Starting point is 01:06:00 every day. He's very, very, very smart. Every day, I'm astounded by the what's his name fabian and uh yeah i'm every day astounded by the quality and the quantity of his of his uh coding but also of of his uh communication in the community the really really great person um i guess more in in terms of traditional programming heroes i guess when i was a child i really was like had a bit of a coding crush on john carmack um you know with with the early it games in the 90s doom and so on um definitely in the go community, Rob Pike. And you probably have heard or even met at GopherCon about Dmitry Vyokov.

Starting point is 01:06:50 He's from Google. He's not really on the Go team, but he's on the Dynamic Tools team of Google. But he has contributed so many awesome, awesome features to the Go runtime and tooling around that. The race detector, the new tracing framework, this fuzzing framework that also just now found actually a bug in Prometheus's query language, really great. And a lot of these really hardcore tools for getting dynamic information about your code.

Starting point is 01:07:23 And he found hundreds of bugs with that. So I was really impressed when I heard about that. And he also gave a really great talk about that at GopherCon that I can highly recommend. Yeah, that was another one I didn't mention on top of the show. Ben Johnson and his open source database stuff. And then Dimitri, and specifically his talk on Matricon,

Starting point is 01:07:44 like you just said was one that everybody was kind of raving about as they came out of the conference room so you're not the only one who thinks he's pretty awesome yep alright Julius well it was great having you on the show definitely

Starting point is 01:07:58 something we've been wanting to get you on the show before to talk about Prometheus and everything it's doing and what you're all doing at SoundCloud so definitely to get you on the show before to talk about Prometheus and everything it's doing and what you're all doing at SoundCloud. So definitely fun having you on the show today. I want to thank our awesome sponsors for the show, CodeShip, TopTile, and DigitalOcean, making this show possible. I also want to thank our awesome listeners and remind everyone that's not a member yet that we are member supported. You can join the community and get access to the members only Slack channel as well as many other awesome benefits of supporting the change.

Starting point is 01:08:28 Well, I'll go to change law.com slash membership. And while you're there, you might as well sign up for change all week. We can change all nightly, which is our weekly and nightly emails, both respectively at slash weekly and slash nightly. Uh,

Starting point is 01:08:41 Joe, what's the next week show? We do have one show scheduled. What is that next week's show i ain't putting me on the spot man uh i think it might be ben johnson i know we got a couple databases i know he's coming up and then he's on august he records on august 14th so we'll have a show between him and now but we don't know who it is we don't know who it is.

Starting point is 01:09:05 We don't know who it is. Okay, we're going to try and tease out what the next show is, but nonetheless, we have lots of awesome shows coming up soon. But until then, let's say goodbye. See ya. See ya. Thank you. We'll see you next time.

CODACE Plant Stand

The Changelog: Software Development, Open Source - Prometheus and service monitoring (Interview)

Julius Volz from SoundCloud joined the show to talk about Prometheus, an open-source service monitoring system written in Go....

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

CODACE Plant Stand

The Changelog: Software Development, Open Source - Prometheus and service monitoring (Interview)

Julius Volz from SoundCloud joined the show to talk about Prometheus, an open-source service monitoring system written in Go....

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.