PurePerformance - The De-Facto Standard of Metrics Capture and Its Untold Histogram Story with Björn Rabenstein

Starting point is 00:00:00 It's time for Pure Performance! Get your stopwatches ready, it's time for Pure Performance with Andy Grabner and Brian Wilson. Hello everybody and welcome to another episode of Pure Performance. My name is Brian Wilson and as always Andy Grabner over there is making fun of me as I do my intro. It wouldn't be Andy without him. So hello Andy and how are you doing today? I'm not making fun of you, I'm just trying to make you laugh. Well mimicking me, mocking me, trying to me crack a smile and on the camera so all our viewers can see um you know speaking of viewers andy i gotta tell you i had another weird dream i don't know how that's from one to the other yeah so it was very bizarre um it was getting to be nighttime and we had this big power outage

Starting point is 00:01:01 right and my daughter adele wanted to stay up she was very ambitious had a lot of things she wanted to get done so i went and got her a lantern and i gave her a lantern which gave her the ability to stay up longer and you know go through the night quite a lot longer and then out of nowhere you show up my house and you're like brian i'm like yeah he's like your daughter adele is young and she must go to sleep. What are you doing giving her this lantern? I forbid it. And I'm like, but come on, Andy, she's working on a project. And like, I say no. And you took and tied me up to a pole. Right. And I'm like, what's going on here? You're like, yeah, Andy, I'm pretty open-minded, but I'm, you know, I'm sure I'm so into this. Right. Like, Brian, you're going to learn your lesson. And then through the front door, good friend of the show, Mark Tomlinson walks in. And I figure he's going to come save me. Right? And what does he do?

Starting point is 00:01:54 He pulls out a knife, cuts open my side, and starts picking away at my liver. And I'm like, what the hell is going on here? And then I woke up and I realized we had the podcast today. So I had to get that. So I don't know what ended up happening. But it's the second time I've had that dream at least. So I don't know what's going on. I think you should start thinking about talking to some people.

Starting point is 00:02:16 They can help you with this because it doesn't really sound normal. I was actually thinking you were going to, with the lantern, you were going to something like enlightening a flame. Because obviously a lantern itself doesn't do anything. Well, it would have to go straightly to the flame. The lantern is the flame, and the punishment is the same. Yeah. All right.

Starting point is 00:02:35 Come on. That was the story of Prometheus. Look at that. It took us a long, long time to get there. That wasn't too long. Come on. No, it's good. It's good.

Starting point is 00:02:49 Hey, well, Brian, thank you for sharing the dream, but also so much thanks goes to Bjorn, who is with us today. He's not only a guest, but he's also a listener that he shares with us. So Bjorn, servus. Thank you for being here. How are you?

Starting point is 00:03:05 I'm good. Thank you. I Björn, servus. Thank you for being here. How are you? I'm good. Thank you. I hope you are good as well. I'm really glad to be here. It always feels like magic showing up in a podcast that you have listened to before. And I hope you're not,

Starting point is 00:03:17 I hope you're okay with all these strange stories. And maybe we will find, we find some help for Brian for future to make sure that his dreams become less violent. So let's make sure this podcast goes as smooth and that you will have

Starting point is 00:03:32 beautiful dreams tonight. We had this saying even in the early days of Prometheus, sometimes using Prometheus feels like bringing fire to humanity and sometimes it feels like your liver gets picked by an eagle. Hey Bjorn, talking about Prometheus, I think some people may know you because you've been speaking at different events. We met each other just a couple of months ago at KubeCon in Amsterdam and then I looked back in history,

Starting point is 00:03:59 you sent a couple of links over, you've been speaking at different Kubernetes days, different cloud-native events, Prometheus events. There's even, and thanks for the point to do this, there's a Prometheus documentary that has been out there, I think, for like six to seven months, which really gives a great overview of where it started, why it actually became that popular. I remember one of your colleagues

Starting point is 00:04:29 who was on the documentary, he said there was like a special moment when somebody brought Prometheus on Hacker News, and then all of a sudden it took off and you still try to figure out who that was. So in case whoever that was is listening in, let us know, because the world is trying to figure out who made this magic moment happen. Beyond here, maybe from you, can you give a little bit of a background just for people that are maybe not familiar with the story

Starting point is 00:04:54 and how it was from your perspective? Yeah, I mean, the documentary has this interesting quality that is mostly non-technical. It really tells the non-technical part of the story. And that's maybe more interesting these days that many people in our profession are very familiar with Prometheus from the technical side. story from my side is probably that I got kind of lured over to SoundCloud by Julius and Matt, who started the whole project very early in the lifecycle, like 2013, I think. They told me they are doing some open source monitoring in the same spirit as what we all know from Google. I was still working at Google. I had the privilege of working from essentially working from home, which back

Starting point is 00:05:45 then was a weird thing, especially for Google. And it was working out great. But I am actually not not following the trend. I'm a great fan of having like my colleagues around me in the same room. And yeah, so I was kind of tempted to change jobs. And then they told me about that project. And a bit as in the documentary where I think I'm saying that if I had been Julius's and Matt's manager,

Starting point is 00:06:12 I would not have approved the project. That was just a very, very weird idea they had. And I mean, I joined when the, let's say, there was a working prototype. There was actually actual monitoring happening at SoundCloud with this very early version of Prometheus when I joined. So all that credit goes to Matt and Julius and the people who helped them at SoundCloud.

Starting point is 00:06:33 But then I joined, and from perspective now, it's like 10 years later, I kind of joined from the beginning, you could say, right? With a bit of error of measurement. But even back then, I had no idea that this would, I like to phrase it like this changed how the world is doing monitoring. Back then it was like, okay, all the ex-Googlers, they will understand it because they know the idea behind it. Then there will be five or six other people in the world who also will understand it and appreciate it. And that's it,

Starting point is 00:07:05 right? That was what I thought and it will be a fun experience. Yeah, and then the rest is history. Everything changed 10 years later. Yeah. Yeah, I mean, Prometheus, as you said, right? I assume most people that listen to us know what Prometheus is. For the handful that doesn't know it, right? As you said, it's a defect to standard as it comes to capturing metrics and collecting metrics from your environment, whether it runs. I think in the early days when you started at SoundCloud, as far as I recall the documentary,

Starting point is 00:07:36 you had your own orchestration engine. You didn't use, Docker was just starting. There was no Kubernetes back then, but you built this in. And now when the CNCF started, they actually asked you to become part of the foundation. And therefore, you've been an early project and you are the de facto standard when it comes to metrics. What I also thought was really interesting, Björn, to what you said earlier, you said, why should we build our own monitoring tool

Starting point is 00:08:07 when we're not a monitoring company? This was also an interesting quote in the documentary because there's always a big debate. Why do you build something yourself if this is not your core business? Because SoundCloud is not in the monitoring space. So can you give some recommendations on when you feel it is important to take that risk?

Starting point is 00:08:34 Yeah, so my recommendation is don't. I mean, we have a huge bias because we only talk about the projects that became a success, right? So who knows how many little Kubernetes or Prometheis, whatever, other people created because back then, to be honest, this is the first thing. If there is something already out there that does the job, then

Starting point is 00:08:59 use it, right? Even if it's not 100% matched. And back then, this is what I say in the documentary, right? We if it's not 100% matched. And back then, this is what I say in the documentary, right? We would have modeled our way through with Nagios, but Nagios would have done only like 10% of the job, right? And StatsD, which was already in active use at SoundCloud, that was actually also a very important innovation that did perhaps 20% of the job, right? But then we still had a lot left and there were even vendors who would collect metrics for you back then. It was just even more expensive than nowadays. So that was kind of not sustainable. And in that situation, there was enough motivation to do it.

Starting point is 00:09:41 But even then, it was a huge risk. And we only know, we are only here talking because Prometheus went above that, whatever is it called, the great filter or something. Sometimes they talk about this when they talk about alien civilizations or something. And so this is hugely biased, right? So I really can't recommend that everyone just try your not invented here thing and do your project and hope it will become a popular open source project. I mean, I guess that was really good. I mean, there was an important point to make why SoundCloud came up with their little, it was called Bazooka. I think it's also in the documentary somewhere. It's essentially a mini Kubernetes before there was Kubernetes.

Starting point is 00:10:26 And then Prometheus, which is, yeah, now we know it as kind of the monitoring system for Kubernetes, but it was actually created before anyone knew about Kubernetes, outside of Google, at least. But yeah, I would really still say

Starting point is 00:10:42 this is the last resort you should take. And you can never, we didn't expect it, right? And you can never expect it to become a popular open source project that changes history. Andy, that reminds me of several years ago at this point we had, I don't remember the details of who was on or the specifics of what it was, but I think it was about when to use off-the-shelf software software when to build your own or when to modify something existing and there were some guidelines around there which seems similar to what Bjorn is

Starting point is 00:11:17 saying if you can you know if you can if you can use what's out there use it it's it's when there it's it's when the gap is large enough that you should look into creating it on your own. This conversation applies to several different aspects. Now monitoring, obviously, but also just even from software packages and all that fun stuff. Especially, I think you also mentioned this in the documentary, there was an architectural shift.

Starting point is 00:11:47 I mean, you changed, we talked all of a sudden with many moving pieces, with many microservices that are coming and going and coming and going. And the observability tools, back then we called them monitoring tools, now we call them observability tools, were built for the previous generation of architectures. I mean, Brian and I, we've been at Dynatrace for many, many years. And looking back 10 years ago, the normal tech stack and the classical architectures

Starting point is 00:12:12 were not what we see now. And we focused on this. And I think now, in 2023, we are really grateful to have Prometheus as an amazing data source that we can use and enrich the data that we collect from other areas of his tech. But as you said, there was a shift in architecture, a shift in technology. There was parts of it available, but not really built for that type of architecture. And then, as you also said in the documentary, the Hacker News thing happened

Starting point is 00:12:44 and Kubernetes happened, Cloud Native took off. architecture. And then, as you also said in the documentary, the Hacker News thing happened and Kubernetes happened, Cloud Native took off, and that basically was the perfect ingredient for takeoff. But I guess you can never plan for this. Yeah, I mean, there were a bunch of projects like Prometheus because the time was ripe, but maybe we were really the first. But this moment where Google realized they cannot just run Kubernetes on their

Starting point is 00:13:10 own because no other vendor will trust it. They needed some kind of foundation, and then they realized we want a foundation not only for Kubernetes, but for cloud native, whatever that will be, and then realized we really need some monitoring for that.

Starting point is 00:13:27 And then they stumbled upon Prometheus. I mean, essentially the Kubernetes people stumbled upon Prometheus pretty quickly. And then we got this call. I think I'm also saying that in the documentary, right? Which was kind of very exciting. And yeah, I mean, they of course were happy because they knew the concept of Prometheus

Starting point is 00:13:49 was very familiar to them as they were all coming from Google and knew that and they of course couldn't use the internal monitoring for the open source project and now they could use something that at least structurally was similar enough that it makes sense to them.

Starting point is 00:14:07 And Bjorn, you mentioned earlier you had your 10-year anniversary, right? I think in the notes that you sent over, your first commit was on November 24 in 2012. So that's a little over, it's like 10 and a half years now. And how did you, I mean, you've probably touched pretty much every part of Prometheus, even though I know in the recent history, you know, you had a lot to do with the new histogram support, which we will talk about as well.

Starting point is 00:14:35 But you've seen pretty much everything, right? I mean, it's so big. There are certainly parts that I've never touched. Also, like the commit you were talking about is the very first in Prometheus at all. That was more or less a year before I joined SoundCloud. And it was by Matt Proud. So my first commit is probably sometime late 2013.

Starting point is 00:15:00 I haven't even looked it up. But this is when I really got, joined SoundCloud as a proper employee and started to work for real on Prometheus. And initially it was clearly like, there were a few handful of people that were mostly sitting in the same room, which is another trap you can fall into with an open source project. If you are like this gang of people in the same room and then you might accidentally exclude others who are not in the same room, and then you might accidentally exclude others who are not in the same room. But maybe that's a different story. Yeah, then you, of course, everyone is kind of in touch with everything. But then there's Prometheus as a whole ecosystem,

Starting point is 00:15:34 instrumentation libraries for all kinds of languages, exporters, as we call them, sometimes confusing the name, Prometheus itself. But then there are so many vendors, other systems that implement the Prometheus APIs, maybe based on the same code base, maybe just mirroring the functionality. Alert Manager is a huge project on its own, if you want. I mean, it's part of the Prometheus org, but it's huge. And so on. There's a bunch of those things. And as you said, there's so much stuff out there already and the project has been 10 plus years. We're not going to talk about the basics and how to get started. So folks, we will add, if you listen

Starting point is 00:16:19 and you want to learn the basics and what this is all about, we will add links to all the relevant Git repositories, documentation, tutorials. So you will find this all in the description. Your favorite subject, right? And I remember when we met a couple of months ago in Amsterdam, we were actually at a little celebration party from some of the observability vendors

Starting point is 00:16:45 that actually were together. And then you said, hey, my favorite topic is histograms. Histograms, histograms. Yeah, that's your favorite topic. And first of all, why histograms? Why is this such an exciting topic? What's the history of the histograms?

Starting point is 00:16:58 Yeah, exactly. I mean, they're one of, I gave many, many talks about Prometheus and then a lot of them are about histograms. And one is actually called Secret History of Histograms, which gives you all the background. But there's one, I think this anecdote is not even in this talk, which is my proof that it was always my favorite topic

Starting point is 00:17:22 because we had the very first talk about Prometheus at a real conference. We had like meetups before, I think, local ones. But there was SRECon 2015 Dublin, where the very first talk about Prometheus at an international conference was happening. All the people from what we would call now the cloud-native community were in the audience. And Brent Burns, like you could say he's the inventor of Kubernetes, was in the audience. And Q&A starts, and Brent Burns raises his hand, and he asks, does it support histograms?

Starting point is 00:17:58 And my response is, that's my favorite topic. So even back then, right, eight years ago. And indeed, I think this was because that monitoring thing at Google was not really good at histograms in a way, but it is so important. Like the whole SLO, I mean, a lot of the SRE,

Starting point is 00:18:19 like the tools in the SRE toolbox rely on histograms, SLO tracking, uptake score. Interesting thing is like tail latency. Tail latency is so important in distributed systems. So there are many, many scenarios where you want to not just get an average, you want like percentiles, or you just want to see a distribution, let's say in a heat map. And all of this can be done by histograms.

Starting point is 00:18:46 And we always wanted to have proper histogram support, quote unquote. But of course, we just had this, like 2015 was like three years, two and a half years after the first commit. So we had essentially a proof of concept. And how do we, like, we had to do it somehow and the idea we came up with i always saw this as like this is essentially a prototype uh we we came up with what we now call classic histograms in prometheus so every bucket is in its own time series so whoever has worked with that knows how that works and knows the pain and And it works really well with the whole execution

Starting point is 00:19:27 and collection model of Prometheus with PromQL. We needed to add one single function to PromQL, histogram quantile, to make quantile calculation work. But we kind of shoehorned this rather flat worldview of Prometheus, where every time series is just a series of floating point numbers into this histogram thing. world view of Prometheus where every time series is just a series of float numbers, floating point numbers into this histogram thing. And it did its job.

Starting point is 00:19:50 You could do aptX score, you could do SLO calculation, you could even draw heatmaps if you really wanted to. They didn't have a really high resolution. But the most important thing was mathematically sound. Because back then, mathematic still, like mathematicians knew about that, of course, but like normal people like you and me, they didn't realize how like percentiles work. They thought I can get like a 99th percentile from every instance of my microservice. And then I averaged them all to get the 99th percentile of my whole microservice.

Starting point is 00:20:24 No, that doesn't work that way and people don't believe it. I actually crafted an example with real numbers on the slide where you could see that it's completely different. Very counterintuitive and the mathematicians were preaching on the mountain all the time, but it took a long time until people realized, especially if you want to aggregate many, many tasks of a service, which is this typical microservice case, distributed systems case, you essentially need histograms or some kind

Starting point is 00:20:57 of what people call digest, just kind of a compressed histogram, if you want. And at least Prometheus, the whole thing how we did in Prometheus that worked. It just had those problems that resolution was really low. So if you actually wanted to calculate a quantile, the precision could be really bad. And it was quite expensive because every bucket was its own time series, which is kind of the heavyweight element in the Prometheus TSTB. So you could do, if your bucket boundaries were appropriate, you could do nicely aptX score, SLO calculations, all those things work. But if you wanted high precision quantile estimates, that was bad.

Starting point is 00:21:44 You also had to pick the right bucket boundaries. If you realized at some point we want different boundaries, that was really painful because changing those boundaries that might mean you can now not aggregate anymore.

Starting point is 00:22:00 So a bunch of yeah, what do I call it? PETA, right? I don't say it. Pain in the neck. That's kind of, I think, the civilized way. Yeah.

Starting point is 00:22:17 So that means, Bjorn, just quickly to recap, if I understand this also correctly, that means in the classical histograms, as you call them, you as the person that wanted to expose histogram data, you had to define basically what are your buckets. And basically you made this based on your assumption of what good buckets would be for that particular type of metric. Exactly right. So a good signal is if your SLO or SLA, let's call it a real SLA,

Starting point is 00:22:46 you have an SLA with your customer that says, we will serve 99% of requests within 100 milliseconds. Then you know you want a bucket bomb at 100 milliseconds. Great, right? But if that SLA changes next month to 80 milliseconds, not so good. And also because of that cost at SoundCloud, we had a lot of three-bucket histograms, which is kind of not really what you want resolution-wise. And because of the cost,

Starting point is 00:23:14 you also would be very judicious with labels. And that's like you would, let's say you have an HTTP server, you want to partition by status code, endpoint, method, all those things, right? And then every partitioning is already a cardinality problem, the usual thing. But now you have, let's say, 10 buckets in your histogram, that increases the problem by an order of magnitude. So people were really like, okay, we just have an counter for all those individual metrics, but we just have one big histogram. But then later you want to know,

Starting point is 00:23:47 okay, is this latency perhaps only happening in the 404s or only happening for this host on that endpoint? And then you can't slice and dice, which is completely against the Prometheus philosophy, essentially. I mean, first of all, you have to know during instrumentation what are interesting latencies. That's against the philosophy. And then you cannot partition, at least not as freely as you usually could.

Starting point is 00:24:11 I mean, you can never freely partition because you always have a cardinality problem. But it's even worse with the classic histograms. And it's also, I mean, I think that the big challenge is, if I can get this correctly, is that you have to ask your engineers to actually put in these boundaries in their code. And there's no separation, not a good separation of concerns on the type of data that you want to collect, and then what you enforce on the data. Yeah, we have this.

Starting point is 00:24:41 I have a talk at a meetup somewhere, which is called Prometheus Proverbs, like the Go Proverbs that you might know, and a colleague made it into Zen of Prometheus, I think. He even created a website, we might, that's Kemal's website, we might link that maybe also. I'll take some notes here as well, to make sure we, so the Proverbs of Prometheus or the Zen of Prometheus. The website is called Zen of Prometheus and it has more than

Starting point is 00:25:07 just in my talk. I was like trying to act as Rob Pike. But one of the proverbs made up for, I mean, some were made up by me, some were made up by others in the community, but one was instrument first, ask questions later. That was the whole idea of you just put a metric. Metrics are relatively cheap. They're much cheaper than other observability signals. So as long as you don't fall into cardinality trap, you are pretty free of adding metrics everywhere. And then the idea is that you don't put assumptions in while you instrument.

Starting point is 00:25:45 This is why you use counters and not gauges, because a gauge would be requests per second already. But if you just count requests, you can later decide, do I want to average over the last 10 seconds or the last 10 minutes? All those things. And similar with histograms, why do I have to decide what latencies are interesting and what

Starting point is 00:26:07 resolution I want? I mean, all those things, that's not what we wanted. It's against this proverb. And that's why I knew already in 2015 we need to do something else, but it took a long time. And one reason is that it's essentially, it's not just fitting well into the existing execution model and data model. It has to, it required a lot of changes throughout the stack. And that's why it took so many years. I mean, not that I worked on this all the time. There were many other things I had to work on. But like in recent years, that was my usual topic I worked on.

Starting point is 00:26:46 So then, how did you solve the problem? I mean, what's the situation now after you learn from the classical histograms, what worked and what didn't work? What's the new histograms? So from Prometheus' point of view, the new thing is that we now have a new metric sample type, like Prometheus and PromQL, the execution, the query language, was always strictly typed or statically typed, as in there are counters

Starting point is 00:27:16 and gauges and stuff like that. But the value type was always this infamous floating point number, which was a deliberate decision and simplified many things. Also, not having an integer. Some people freak out because there are no integers, which is another discussion. But now we have this Prometheus histogram data type. I should always say, in text, I usually capitalize histogram

Starting point is 00:27:43 when I mean the Prometheus data type, and I have lowercase histogram when I mean the statistical concept. In a podcast, it's hard to distinguish. Okay, so we have this new data type or value type of histogram. So now a sample is not just a timestamp floating point number, it's a timestamp big blob, essentially, which is all the components of a histogram, all the buckets, and this sum and this cone that existed before as well, a separate series.

Starting point is 00:28:11 And most importantly, with this concept, we have just one time series per histogram, and if a bucket isn't populated, it just doesn't exist. The whole blob also contains where are those buckets located, where are gaps, which is the

Starting point is 00:28:26 reason why sometimes they were called sparse histograms. We gave up on that because that's just one of the properties. With this idea, we solved this additional cardinality explosion at the price of having a more complex data structure to handle. But the most important part here is really that with the classic histograms, you define a bucket schema and then every bucket, even if it's never used, creates a time series. And now we essentially only use storage, work, whatever, resources, when the bucket actually contains some data.

Starting point is 00:29:09 And I know you said, Bjorn, it's always a little tough to explain these things just on audio track on a podcast. There's some great sessions from different conferences out there. One of them, I think it's called Native Histograms in Prometheus. It's a talk from PromCon in Munich from 2022. Your colleague, Ganesh, I think was one of

Starting point is 00:29:34 the presenters and you followed him afterwards with a demo. But I think that's, folks, if you want to visualize, because Ganesh did a really good job in kind of having visuals, right? What's the classical histograms? What are the new native histograms? And how does the bucketing work? So it's a great... Exactly. And he uses nice cartoon graphics for that

Starting point is 00:29:54 because another aspect... I mean, there are so many new aspects, right? Another aspect is that you can essentially change the resolution on the fly. And we have a smart schema for doing this by essentially just cutting buckets into two. So if you go one resolution higher, you just half all the bucket width and so on. So we have this weird two to the power of two to the power of n.

Starting point is 00:30:22 It's kind of the formula behind it, how much a bucket grows from one to the next. So it's a lot of mathematics. It's nicely explained by Ganesh in this talk. And the result is that you can essentially pick any resolution. You can tweak it up or down, depending on how much resources you are willing to invest. And you can still

Starting point is 00:30:46 aggregate in, you would essentially meet on the lowest resolution if you aggregate different histograms from somewhere else in time or space, as I like to call it. So it could be something from the past where you had a lower resolution, but it would still work. Or you have like just coming from a different instance in your microservice universe, you can still aggregate in all directions forever, essentially. As long as you keep this specific way of cutting buckets, right now in the implementation, there is also no other way. So you will always use the right way.

Starting point is 00:31:23 The only little downside here is that you cannot just say, I want a bucket at 100 milliseconds, because now you have to follow that schema, but the resolution is so incredibly high that you can have a bucket at like 99 point something, something. So it's very close. In the future, we are planning custom bucket layouts, so you could actually do this again.

Starting point is 00:31:45 Right now, I would say just use the classic histograms if you have a clear idea of where your bucket boundary is. But that would also break this promise of permanent aggregability or whatever the word is. But that's another big deal with the classic histograms that you never have to configure buckets again, and you can never have wrong buckets that don't aggregate. You just say, I want a resolution of approximately 10% growth

Starting point is 00:32:12 from one bucket to the next, which is already... Usually you have double bucket size in the classic histogram, so it's kind of a huge jump in resolution that is now feasible. And then that's the only thing you pick. And then you're essentially done with your instrumentation. And that's very cool. So that means, Bjorn, again, I always try to, if I understand this correctly, I want to tell it back to you.

Starting point is 00:32:38 It means if I am a developer and I want to have certain metrics as a histogram, then my client library is basically, I'll tell the client library what the granularity is, like what's the resolution, and then the client library figures out, based on the data that comes in, what this actually means in terms of bucket sizes.

Starting point is 00:32:55 And this may also change, right? Because as the data changes, the buckets will then potentially change. And now when you talk about aggregation across multiple entities, obviously this would then happen on the Prometheus server when you're executing your queries. Then this is where the aggregation happens and it goes to the lowest resolution and then gives you the result. Is this right? Did I get this right? Yes, yes, precisely. And it's all baked into the new version of PromQL.

Starting point is 00:33:27 I was initially, a couple of years ago, I was almost certain that this would require a major release of Prometheus. But then we figured out a way of not having any breaking changes. So now you can even handle classic histograms and native histograms that could be both in your Prometheus server. I mean, you cannot mix and match them, but the queries look slightly different. That's in the other PromCon talk,

Starting point is 00:33:51 how the queries now look like. But they kind of look even simpler because the old queries for classic histogram, they kind of had to take into account that you actually have a bunch of series and they just happen to be the buckets of a histogram. And now you have a histogram series and you apply a function to it and it's doing the right thing. And do you think it could also be potentially become an issue as it's so easy now to use them

Starting point is 00:34:19 that everybody will just start using histograms and therefore, I don't know, you may run into a scalability issue, into a performance issue. I mean, I assume you've done some good performance testing on this as well, even though you said there's a lot of optimization that went into because you don't store buckets that don't exist. But how about the scalability aspect and performance aspect of all this? Yeah, so there is, I mean, you have concepts, right, and ideas,

Starting point is 00:34:48 and it should work. So in one of my past talks, I created a wish list, like what works, what doesn't work super well with the classic histogram, what should work much better with the new native histograms. And all those wishes became true, essentially.

Starting point is 00:35:03 And the last wish was this all should happen at a lower cost. And ideally, so that I can finally attach labels to my histograms, partition my histograms at will, and don't have to worry about the humongous cost of that. And I kind of marked this as maybe back then. And now at Observability Day, which is one of the zero-day events at KubeCon, now the recent one

Starting point is 00:35:28 in Amsterdam, I essentially gave fresh results of the press about real production use cases of that. And the bottom line is essentially you get 10 times the resolution for half the price. It's so hard to compare because they're so different. And especially I compared the use case. We had one framework, essentially.

Starting point is 00:35:53 It's Weaveworks Common. It's like maybe some people use this as well. It's like Weaveworks is also like an important player in the cloud native space. And we use an open source framework of theirs for microservices, which is already instrumented. There are also very early Prometheus adapters. And this is a framework we use for many microservices at Grafana.

Starting point is 00:36:17 And it gives you an HTTP server that is instrumented with classic histograms that are actually partitioned by all those labels you want. It's a very expensive histogram. And I switched this over to a native histogram with 10x the resolution. And then put it out in the wild and see what happens. And the good thing is, this talk goes into all the details. We should probably also have it in the show notes.

Starting point is 00:36:44 But the bottom line here is, for one, especially if you do all this partitioning, you get a lot of sparse pocket populations. Intuitively, it's easy to understand. Your 404s will probably

Starting point is 00:37:00 all have a very similar response time. You essentially just have to find out this is an endpoint that doesn't exist and throw like a 404 arrow back and it takes one millisecond, mostly, right? So your histogram for the 404s will just have a few buckets populated around one millisecond,

Starting point is 00:37:18 while your 200s will maybe have a spread because you have different workloads. But maybe then again, a certain endpoint will have a typical latency. So you get fewer and fewer populated buckets, the more fine granular you partition your histogram. And with the new native histograms, that means less effort to store that. And then you can partition because you have not a sublinear growth in cost. And the outcome was in the end that essentially all the buckets you have

Starting point is 00:37:49 with this original histogram, super low resolution but partitioned, it's about the same amount of buckets, then populated bucket with the native histogram, same partitioning but 10x resolution. And then it's stored not in individual time series, which gives you another lever of reducing cost. And this is where, in the end,

Starting point is 00:38:11 it's like 10x the resolution for half the price. That's where it's coming from. But the good thing is if you run into problems with this is too expensive, you just say, okay, let's use a lower resolution. Nothing breaks. You have lower resolution, of course, but you can still aggregate

Starting point is 00:38:28 everything. It's not painful to change that, and it's very easy to adjust to your desired resource cost. One more question on the performance testing. I know you said you just then turned it on and see what was happening.

Starting point is 00:38:44 It's kind of like testing in production. But did you build any internal testing tools for that in the beginning to just create a lot of data and to see how things react? I mean, very early, that was my starting point. And that's also an important story, an important question that people ask. Those concepts we are using here are not very new.

Starting point is 00:39:05 This whole idea of having a sparse histogram, there are so many implementations for that. There have been metrics vendors that have been offering this for a while. Why is Prometheus so much behind? And one of the reasons was that with the conventional view, for example, especially if you have a vendor that just collects your metrics, you kind of collect a histogram for a minute and then you package it up, send it to your vendor, and then you start anew.

Starting point is 00:39:33 So you always have this clean slate after every minute. And my fear was in Prometheus that you have to collect the data essentially permanently because anytime somebody can come along and scrape at whatever interval they want, it's also called stateless scraping, right? So you can never say, okay, this histogram has been scraped, I can erase it now. And the important result was that collecting a histogram for a minute

Starting point is 00:40:00 often already fills a lot of buckets and not many more buckets get filled if you collect for an hour i call this like entropy accumulation and that was a very early experiment i did on real life data like i just looked at latency data from our production systems and then i didn't like didn't even use like a prometheus i just collected the data and did some math and number crunching on it to find out, okay, if I now have this pocketing schema, which pocket will be populated for how long? And then I found out that we have this nature that a lot of pockets don't get populated

Starting point is 00:40:35 and that latency is usually like it's obviously not randomly distributed. And you go into some entropy saturation pretty quickly. And after an hour, if you want to, you can still reset a histogram, even in the Prometheus world. It's a counter reset, as we call it. And if that doesn't happen too often, you don't lose too much data. And that was the initial breakthrough where I realized we can use existing concepts and can have those kind of histograms in Prometheus. Fascinating and I mean I'm so glad that I also watched all of these talks earlier because you know it's really good to have a visual in your head what

Starting point is 00:41:19 your colleague was presenting and also what you presented. Brian, histograms, it's a big topic that we also hear, right? All the percentile values from a Dynatrace site. Yes, yeah, yeah. What I would like to do is I need to get this recording to some of our engineers that have basically built this type of support into our product. I know we're also working on proper histogram support for Prometheus data as well because we are scraping. We also understand Prometheus and we can ingest Prometheus. But it's just really fascinating to hear

Starting point is 00:41:54 what thought went into this. If you really sit down, this is for me the great thing and what gives, I think, a lot of people confidence in the whole thing. And that's why it's so popular right and as as you say in the art of the zen of prometheus one does not simply use histograms and i think you just really um exemplified why because if there is that much thought and i don't want to say complexity in a negative way but it's it's a very advanced and robust concept that people probably take for granted um yeah it's definitely something that thought has to be put into obviously you all put tons of thought into it uh which is which is really

Starting point is 00:42:41 amazing and then hopefully people like us can benefit from grabbing that data and letting the users use it to make everything better. To make the world a better place, as we like to say. Exactly. And Bernd, I guess what you said as well, in small, like in demo settings, any normal maybe implementation would also do. But the real problem comes in as you scale,

Starting point is 00:43:10 as you scale your dimensions, as you have more data in the dimensions that SoundCloud and also the other big players are now using Prometheus, you really need to have a very efficient approach to histograms. And efficient means it starts with storing the right data and don't store data that is meaningless. And then it's what I... Thank you so much for the enlightenment. This was really, really, really well. Björn, what's next? Are you done now? Can you finally retire or what's the next big topic? I was just about to say we're not even done with the histograms.

Starting point is 00:43:47 The sad news is that the full instrumentation library support is currently only in the Go client. And there it's really like, I mean, Brian alluded to that, it's so simple to do it. You essentially say, give me a 10% resolution like bucket-to-buck bucket growth, and then there's your native histogram. You don't have to think about all the thoughts that we put into it, and it just works. Java, the Java instrumentation library has like preliminary support. And the huge block,

Starting point is 00:44:18 roadblock here is that this histogram representation works really nice with Protobuf, which is the reason why we kind of resurrected the Protobuf script format, which was already declared dead. The secret history of histograms talk goes into detail there, how that happened. And we want to create a text representation for the native histograms as well. But that's a hard knot to crack and people are working on that right now. And that would unblock clients that have never touched Protobuf,

Starting point is 00:44:53 like Python or the Ruby client, and also make it really simple for third-party providers of instrumentation libraries so that it's everywhere. But right now, everyone is probably pumped now to try it out. But if you're not using Go, you so that it's everywhere. But right now, everyone is probably pumped now and want to try it out. But if you're not using Go, you cannot try it out easily.

Starting point is 00:45:09 That's the sad news, and that still has to be done. So everybody rewrite all your code in Go, re-architect your entire organization for Go, and the excuse will be just so you can get the histograms. And here we go. The business case. Here we go. There you go.

Starting point is 00:45:23 Hey, now we can't stop saying Go. Oh, brother. Yeah, and I just have to ask, and here we go business case here we go there you go hey now we can't stop saying go oh brother um yeah and i just have to ask you know on the back end prometheus side is prometheus being used to monitor prometheus it's always yeah yeah of course yeah of course yes of course that's the answer yeah that was one of the insights i mean now it's kind of, everyone talks about that or has already realized that. But let's say 10 years ago, that was still a big deal, talking about developers that never are concerned with any kind of ops work.

Starting point is 00:45:58 And this all, I don't even would call it like shift left or shift right. It's just like the kind of task you do become more similar. I became an SRE in 2006 when nobody knew what that is. But the whole idea of using essentially a software engineering approach to operational functions, that was already very, very important in our very complex world we have now. I mean, back then, big internet giants had this, and now everyone has this problem.

Starting point is 00:46:30 But also for developers, that they say, okay, I have to be concerned about instrumenting my code, and I can actually use it in my debug cycles. If you have instrumented your code with all the signals that are out there, you can use it to optimize your software, to debug your software. And of course, this was such an, I don't know, it was such an enlightenment in a way to talk about the fire again, that you link in the Prometheus instrumentation library. You don't even instrument a single thing,

Starting point is 00:47:03 but it gives you all the runtime metrics. It gives you process metrics. For Go binary, you get Go runtime metrics. For Java binary, you get Java runtime metrics. And just having this all the time, at any time you can look, okay, what's my heap size, like all those

Starting point is 00:47:17 things, and you have hopefully collected this over time. This is so valuable, even for the development process and optimization and everything. And now, of course, we do this with other signals, like continuous profiling becomes a thing now. And yeah, I mean, all of that helps during development. And so it's not even shift left or shift right.

Starting point is 00:47:37 It's just like shift everywhere. And of course, Prometheus, we instrumented Prometheus with Prometheus from the beginning, and it was super valuable. But also, Go is coming from Google as well, and they had these insights from the beginning. So Go comes with the built-in profiling endpoints and really good debug tooling, profiling tooling. And of course,

Starting point is 00:48:06 that helped us so much for optimizing Prometheus itself, which is, of course, not distributed tracing. So it's not as exciting for you, I guess. But it's kind of tracing, if you want, and helps a lot. And Prometheus itself is just a single binary server. And it's kind of simple on purpose, which doesn't cover if vendors offer like implement the Prometheus API, then of course they have distributed systems and then they start to get into

Starting point is 00:48:37 all the nice additional complications. And of course, Prometheus is also instrumented with OTEL tracing, right? So if you want that for Prometheus is also instrumented with OTEL tracing so if you want that for Prometheus alone it might make sense in some situations to do this but also if you just use the code and you link it into your implementation

Starting point is 00:48:56 of a distributed Prometheus it's good that it's all there and you use all those signals and it's a perfect full circle right? Hey Bjorn that it's all there and you use all those signals and it's a perfect full circle. Bjorn, after 10 years of Prometheus and I'm pretty sure

Starting point is 00:49:15 you are still excited about histograms and there's still so much stuff to do. I assume this is not going to be the only podcast we do with you. At least I hope so. It's not going to be the only one, but we will do more with you as new things come up that are relevant for our listeners, because I think this is extremely relevant. And I think from multiple angles.

Starting point is 00:49:36 The one angle is because most of the people that we interact with, Brian and I, Prometheus is just there. And so it's for us great to learn more about it, understand it better. But also what for me personally was very interesting, just the performance aspect of Prometheus itself. That's also interesting because we have a lot of listeners, I believe, that are or at least have a background

Starting point is 00:49:59 in performance engineering. And this is why also thanks for giving us a little bit of insights there as well. Yeah. But with this, I don't know, did we miss anything?

Starting point is 00:50:10 Anything beyond that you need to get off your chest that you think this is something you need to say? Maybe you should invite my colleague

Starting point is 00:50:17 Brian Bohem who has done a lot of optimization PRs recently. Like every Prometheus minor release had another X percent CPU or memory decrease

Starting point is 00:50:28 because he kind of did the profiling dance and found another thing and is also a really small person to find those things and make them better. And then you realize, oh my gosh, we wasted so much memory all the time because we didn't write the proper code. But yeah, that's how software engineering works.

Starting point is 00:50:48 I was going to say, let's stay in the obvious of what people often miss is that, oh, we're using so many resources because we didn't write the proper code. Like, yes, that's the source of so much of our business. That's an amazing observation. All right, Brian, should we bring it home? Let's bring it home, Andy. Um, did I call you handy?

Starting point is 00:51:15 I said home Andy and in my head I heard handy. I'm going to start calling you handy now. Um, you're, you're a very handy person to have around um see you're valuable andy um more so than i am on this podcast except for i think we always have different episodes right where where one of us is just more i know i was talking a lot today because oh yeah this is this was a lot more it was so it was so it's still fresh in my head because i watched the documentary this morning yeah i think it's amazing there's a documentary about a technology that's not about the numbers and words or the deep side of it.

Starting point is 00:51:53 I don't know if that's a first. I mean, obviously there's histories of Windows and Microsoft and stuff like that, but for there to be a documentary about something like Prometheus is quite amazing. I think the other big takeaway for me, at least today was that um maybe people don't have to go as deep as bjorn and team did on things like histograms but i think it's important for people to understand the metrics they're using right because when you you you talked about you know histograms and you can't just you know take an average of the percentiles or the percentile of the percentiles, however it was you were explaining in the beginning when you first

Starting point is 00:52:30 started talking about the histograms, if you don't know what's behind the metric you're using, how can you properly use it? So I think that's an important thing. And that even came to light way back in our early days, Andy, when everything When everything was averages and then people were like, yeah, you should really be looking at like maybe a 50th and 90th percentile. Cause that's, you know, your average can be wildly wrong. And then like, oh wow. Now that I think about it, which it never took the time to before it makes total sense. So just, just proves that point some more. So anyway.

Starting point is 00:53:03 Yeah. Thank you Bjorn so much for being on and Andy, thanks for arranging this. I hope everybody got something out of this one today. It was amazing to have someone who started Puma Reader, well, I guess, jumped on board very beginning. I don't want to give you credit because you don't want the credit of starting it, but you were right there at the beginning, we'll say, and I won't use your title. I don't want to put onus on you. But yeah, someone who was there from basically the beginning, who's a main contributor. And let's just call you responsible for all of Prometheus. Let's just build you up. Fantastic having you on.

Starting point is 00:53:39 What? And the internet. And the internet, yes. And the internet. You gave Al Gore the idea. So it's an amazing honor to have you on. Thank you so very much. And we hope everyone enjoyed it. And look forward to having you on again. And thanks for everyone for listening. Great.

Starting point is 00:53:57 Thank you very much. Thank you. Bye-bye. Bye-bye.

PurePerformance - The De-Facto Standard of Metrics Capture and Its Untold Histogram Story with Björn Rabenstein

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.