PurePerformance - The De-Facto Standard of Metrics Capture and Its Untold Histogram Story with Björn Rabenstein
Episode Date: June 19, 2023As far as we know - besides Kubernetes there is only Prometheus that belongs to the prestigious group of open-source projects that have their own documentary. Now why is that? Prometheus has emerged a...s the go-to solution for capturing metrics in modern software stacks, earning its status as the de facto standard. With its widespread adoption and a constantly expanding ecosystem of companion tools, Prometheus has become a pivotal component in the software development landscape.Join us as we sit down with Björn Rabenstein, an accomplished engineer at Grafana, who has dedicated nearly a decade to actively contributing to the Prometheus project. Björn takes us on a journey through the project's early days, unravels the reasons behind its meteoric rise, and provides us with insightful technical details, including his personal affinity for Histograms.Here are the links we discussed during the podcast for you to follow up:Prometheus Documentary: https://www.youtube.com/watch?v=rT4fJNbfe14First Prometheus talk at SRECon 2015: https://www.youtube.com/watch?v=bFiEq3yYpI8The Zen of Prometheus: https://the-zen-of-prometheus.netlify.app/Talk from Observability Day KubeCon 2023: https://www.youtube.com/watch?v=TgINvIK9SYcSecret History of Prometheus Histograms: https://archive.fosdem.org/2020/schedule/event/histograms/Prometheus Histograms: https://promcon.io/2019-munich/talks/prometheus-histograms-past-present-and-future/Native Histograms: https://promcon.io/2022-munich/talks/native-histograms-in-prometheus/PromQL for Histograms: https://promcon.io/2022-munich/talks/promql-for-native-histograms/
Transcript
Discussion (0)
It's time for Pure Performance!
Get your stopwatches ready, it's time for Pure Performance with Andy Grabner and Brian Wilson.
Hello everybody and welcome to another episode of Pure Performance.
My name is Brian Wilson and as always Andy Grabner over there is making fun of me as I do my intro.
It wouldn't be Andy without him. So hello Andy and how are you doing today?
I'm not making fun of you, I'm just trying to make you laugh. Well mimicking me, mocking me, trying to me crack a smile and on the camera so all our viewers can see um you know speaking of viewers andy
i gotta tell you i had another weird dream i don't know how that's from one to the other yeah
so it was very bizarre um it was getting to be nighttime and we had this big power outage
right and my daughter adele wanted to stay up she was very ambitious
had a lot of things she wanted to get done so i went and got her a lantern and i gave her a lantern
which gave her the ability to stay up longer and you know go through the night quite a lot longer
and then out of nowhere you show up my house and you're like brian i'm like yeah he's like
your daughter adele is young and she must go to sleep. What are you doing giving her this lantern? I forbid it. And I'm like, but come on, Andy, she's working on a project. And like, I say no. And you took and tied me up to a pole. Right. And I'm like, what's going on here? You're like, yeah, Andy, I'm pretty open-minded, but I'm, you know, I'm sure I'm so into this. Right. Like, Brian, you're going to learn your lesson.
And then through the front door, good friend of the show, Mark Tomlinson walks in.
And I figure he's going to come save me. Right?
And what does he do?
He pulls out a knife, cuts open my side, and starts picking away at my liver.
And I'm like, what the hell is going on here?
And then I woke up and I realized we had the podcast today.
So I had to get that.
So I don't know what ended up happening.
But it's the second time I've had that dream at least.
So I don't know what's going on.
I think you should start thinking about talking to some people.
They can help you with this because it doesn't really sound normal.
I was actually thinking you were going to, with the lantern,
you were going to something like enlightening a flame.
Because obviously a lantern itself doesn't do anything.
Well, it would have to go straightly to the flame.
The lantern is the flame, and the punishment is the same.
Yeah.
All right.
Come on.
That was the story of Prometheus.
Look at that.
It took us a long, long time to get there.
That wasn't too long.
Come on.
No, it's good.
It's good.
Hey, well, Brian, thank you for sharing the dream,
but also so much thanks goes to Bjorn,
who is with us today.
He's not only a guest,
but he's also a listener that he shares with us.
So Bjorn, servus.
Thank you for being here.
How are you?
I'm good. Thank you. I Björn, servus. Thank you for being here. How are you? I'm good.
Thank you.
I hope you are good as well.
I'm really glad to be here.
It always feels like magic
showing up in a podcast
that you have listened to before.
And I hope you're not,
I hope you're okay
with all these strange stories.
And maybe we will find,
we find some help for Brian
for future
to make sure that his
dreams become less violent.
So let's make sure this podcast goes as smooth and that you will have
beautiful dreams tonight.
We had this saying even in the early days of Prometheus,
sometimes using Prometheus feels like bringing fire to humanity and sometimes
it feels like your liver gets picked by an eagle.
Hey Bjorn, talking about Prometheus,
I think some people may know you because you've been speaking
at different events. We met each other just a couple of months ago
at KubeCon in Amsterdam and then I looked back in history,
you sent a couple of links over, you've been speaking at different Kubernetes days,
different cloud-native events, Prometheus events.
There's even, and thanks for the point to do this,
there's a Prometheus documentary that has been out there, I think, for like six to seven months,
which really gives a great overview
of where it started, why it actually
became that popular.
I remember one of your colleagues
who was on the documentary, he said there was like
a special moment when somebody brought Prometheus
on Hacker News, and then all of a sudden it took off
and you still try to figure out who that was.
So in case whoever that was is listening in, let us know,
because the world is trying to figure out who made this magic moment happen.
Beyond here, maybe from you, can you give a little bit of a background
just for people that are maybe not familiar with the story
and how it was from your perspective?
Yeah, I mean, the documentary has this interesting quality
that is mostly non-technical.
It really tells the non-technical part of the story.
And that's maybe more interesting these days that many people in our profession are very familiar with Prometheus from the technical side. story from my side is probably that I got kind of lured over to SoundCloud by Julius
and Matt, who started the whole project very early in the lifecycle, like 2013, I think.
They told me they are doing some open source monitoring in the same spirit as what we all
know from Google. I was still working at Google. I had the privilege of working from essentially working from home, which back
then was a weird thing, especially for Google.
And it was working out great.
But I am actually not not following the trend.
I'm a great fan of having like my colleagues around me in the same room.
And yeah, so I was kind of tempted to change jobs.
And then they told me about that project.
And a bit as in the documentary where I think I'm saying that
if I had been Julius's and Matt's manager,
I would not have approved the project.
That was just a very, very weird idea they had.
And I mean, I joined when the, let's say,
there was a working prototype.
There was actually actual monitoring happening at SoundCloud
with this very early version of Prometheus when I joined.
So all that credit goes to Matt and Julius
and the people who helped them at SoundCloud.
But then I joined, and from perspective now,
it's like 10 years later,
I kind of joined from the beginning, you could say, right?
With a bit of error of measurement.
But even back then, I had no idea that this would, I like to phrase it like this changed how the world
is doing monitoring. Back then it was like, okay, all the ex-Googlers, they will understand
it because they know the idea behind it. Then there will be five or six other people in
the world who also will understand it and appreciate it. And that's it,
right? That was what I thought and it will be a fun experience. Yeah, and then the rest is history.
Everything changed 10 years later. Yeah. Yeah, I mean, Prometheus, as you said,
right? I assume most people that listen to us know what Prometheus is. For the handful that doesn't
know it, right? As you said, it's a defect to standard as it comes to capturing metrics
and collecting metrics from your environment,
whether it runs.
I think in the early days when you started at SoundCloud,
as far as I recall the documentary,
you had your own orchestration engine.
You didn't use, Docker was just starting.
There was no Kubernetes back then,
but you built this in.
And now when the CNCF started, they actually asked you to become part of the foundation.
And therefore, you've been an early project and you are the de facto standard when it comes to metrics.
What I also thought was really interesting, Björn, to what you said earlier,
you said, why should we build our own monitoring tool
when we're not a monitoring company?
This was also an interesting quote in the documentary
because there's always a big debate.
Why do you build something yourself
if this is not your core business?
Because SoundCloud is not in the monitoring space.
So can you give some recommendations
on when you feel it is important to take that risk?
Yeah, so my recommendation is don't.
I mean, we have a huge bias
because we only talk about the projects
that became a success, right?
So who knows how many little Kubernetes or Prometheis,
whatever, other people created because
back then, to be honest, this is the first thing.
If there is something already out there that does the job, then
use it, right? Even if it's not 100% matched. And back then, this is what
I say in the documentary, right? We if it's not 100% matched. And back then, this is what I say in the documentary,
right? We would have modeled our way through with Nagios, but Nagios would have done only like 10%
of the job, right? And StatsD, which was already in active use at SoundCloud, that was actually
also a very important innovation that did perhaps 20% of the job, right? But then we still had a lot left and there were even vendors who would collect metrics for you back then.
It was just even more expensive than nowadays.
So that was kind of not sustainable.
And in that situation, there was enough motivation to do it.
But even then, it was a huge risk. And we only know, we are only here talking
because Prometheus went above that, whatever is it called, the great filter or something.
Sometimes they talk about this when they talk about alien civilizations or something.
And so this is hugely biased, right? So I really can't recommend that everyone just try your not invented here thing and do your project and hope it will become a popular open source project.
I mean, I guess that was really good.
I mean, there was an important point to make why SoundCloud came up with their little, it was called Bazooka.
I think it's also in the documentary somewhere.
It's essentially a mini Kubernetes before there was Kubernetes.
And then Prometheus,
which is, yeah, now we know it
as kind of the monitoring system for Kubernetes,
but it was actually created
before anyone knew about Kubernetes,
outside of Google, at least.
But yeah,
I would really still say
this is the last resort you should take.
And you can never, we didn't expect it, right?
And you can never expect it to become a popular open source project that changes history.
Andy, that reminds me of several years ago at this point we had,
I don't remember the details of who was on or the specifics of what it was,
but I think it was about when to use off-the-shelf
software software when to build your own or when to modify something existing and
there were some guidelines around there which seems similar to what Bjorn is
saying if you can you know if you can if you can use what's out there use it it's
it's when there it's it's when the gap is large enough that you should look into
creating it on your own.
This conversation applies to several different aspects. Now monitoring, obviously,
but also just even from software packages and
all that fun stuff.
Especially, I think you also mentioned this in the documentary,
there was an architectural shift.
I mean, you changed, we talked all of a sudden with many moving pieces,
with many microservices that are coming and going and coming and going.
And the observability tools, back then we called them monitoring tools,
now we call them observability tools,
were built for the previous generation of architectures.
I mean, Brian and I, we've been at Dynatrace for many, many years.
And looking back 10 years ago,
the normal tech stack and the classical architectures
were not what we see now.
And we focused on this.
And I think now, in 2023,
we are really grateful to have Prometheus
as an amazing data source that we can use and enrich the data that we collect from other areas of his tech.
But as you said, there was a shift in architecture, a shift in technology.
There was parts of it available, but not really built for that type of architecture.
And then, as you also said in the documentary, the Hacker News thing happened
and Kubernetes happened, Cloud Native took off. architecture. And then, as you also said in the documentary, the Hacker News thing happened and
Kubernetes happened, Cloud Native took off, and that basically was the perfect
ingredient for takeoff. But I guess you can never plan for this.
Yeah, I mean, there were a bunch of projects like
Prometheus because the time was ripe, but maybe we were really the first.
But this moment
where Google realized they
cannot just run Kubernetes on their
own because no
other vendor will
trust it. They needed some kind of
foundation, and then they realized we want
a foundation not only for Kubernetes,
but for cloud native,
whatever that will be, and then realized
we really need some monitoring for that.
And then they stumbled upon Prometheus.
I mean, essentially the Kubernetes people stumbled upon Prometheus pretty quickly.
And then we got this call.
I think I'm also saying that in the documentary, right?
Which was kind of very exciting.
And yeah, I mean, they of course were happy because
they knew the
concept of Prometheus
was very familiar to them as they were all
coming from Google and knew
that and they of course couldn't use
the internal monitoring for the
open source project and now they could
use something that at least
structurally was similar
enough that it makes sense to them.
And Bjorn, you mentioned earlier
you had your 10-year anniversary, right?
I think in the notes that you sent over,
your first commit was on November 24 in 2012.
So that's a little over, it's like 10 and a half years now.
And how did you, I mean, you've probably touched pretty much every part of Prometheus,
even though I know in the recent history, you know, you had a lot to do with the new histogram support,
which we will talk about as well.
But you've seen pretty much everything, right?
I mean, it's so big.
There are certainly parts that I've never touched.
Also, like the commit you were talking about
is the very first in Prometheus at all.
That was more or less a year before I joined SoundCloud.
And it was by Matt Proud.
So my first commit is probably sometime late 2013.
I haven't even looked it up.
But this is when I really got,
joined SoundCloud as a proper
employee and started to work for real on Prometheus. And initially it was clearly like,
there were a few handful of people that were mostly sitting in the same room, which is
another trap you can fall into with an open source project. If you are like this gang of people in
the same room and then you might accidentally exclude others who are not in the same room, and then you might accidentally exclude others who are not in the same room. But maybe that's a different story. Yeah, then you, of course,
everyone is kind of in touch with everything. But then there's Prometheus as a whole ecosystem,
instrumentation libraries for all kinds of languages, exporters, as we call them, sometimes
confusing the name, Prometheus itself. But then there are so many vendors,
other systems that implement the Prometheus APIs, maybe based on the same code base, maybe
just mirroring the functionality. Alert Manager is a huge project on its own, if you want.
I mean, it's part of the Prometheus org, but it's huge. And so on. There's a bunch of those things.
And as you said, there's so much stuff out there already
and the project has been 10 plus years. We're not going to talk about
the basics and how to get started. So folks, we will add, if you listen
and you want to learn the basics and what this is all about, we will add links to
all the relevant Git repositories,
documentation, tutorials.
So you will find this all in the description.
Your favorite subject, right?
And I remember when we met a couple of months ago in Amsterdam,
we were actually at a little celebration party
from some of the observability vendors
that actually were together.
And then you said,
hey, my favorite topic is histograms.
Histograms, histograms.
Yeah, that's your favorite topic.
And first of all, why histograms?
Why is this such an exciting topic?
What's the history of the histograms?
Yeah, exactly.
I mean, they're one of,
I gave many, many talks about Prometheus
and then a lot of them are about histograms.
And one is actually called Secret History of Histograms,
which gives you all the background.
But there's one, I think this anecdote is not even in this talk,
which is my proof that it was always my favorite topic
because we had the very first talk about Prometheus at a
real conference. We had like meetups before, I think, local ones. But there was SRECon 2015
Dublin, where the very first talk about Prometheus at an international conference was happening.
All the people from what we would call now the cloud-native community were in the audience.
And Brent Burns, like you could say he's the inventor of Kubernetes,
was in the audience.
And Q&A starts, and Brent Burns raises his hand,
and he asks, does it support histograms?
And my response is, that's my favorite topic.
So even back then, right, eight years ago.
And indeed, I think this was because
that monitoring thing at Google
was not really good at histograms in a way,
but it is so important.
Like the whole SLO,
I mean, a lot of the SRE,
like the tools in the SRE toolbox
rely on histograms,
SLO tracking, uptake score.
Interesting thing is like tail latency. Tail latency is so important in
distributed systems. So there are many, many scenarios where you want
to not just get an average, you want like percentiles, or you just want
to see a distribution, let's say in a heat map. And
all of this can be done by histograms.
And we always wanted to have proper histogram support, quote unquote.
But of course, we just had this, like 2015 was like three years, two and a half years
after the first commit.
So we had essentially a proof of concept.
And how do we, like, we had to do it somehow and the idea we came up with i always
saw this as like this is essentially a prototype uh we we came up with what we now call classic
histograms in prometheus so every bucket is in its own time series so whoever has worked with that
knows how that works and knows the pain and And it works really well with the whole execution
and collection model of Prometheus with PromQL.
We needed to add one single function to PromQL,
histogram quantile, to make quantile calculation work.
But we kind of shoehorned this rather flat worldview
of Prometheus, where every time series is just a series
of floating point numbers into this histogram thing. world view of Prometheus where every time series is just a series of float numbers,
floating point numbers into this histogram thing.
And it did its job.
You could do aptX score, you could do SLO calculation,
you could even draw heatmaps if you really wanted to.
They didn't have a really high resolution.
But the most important thing was mathematically sound.
Because back then, mathematic still, like mathematicians
knew about that, of course, but like normal people like you and me, they didn't realize
how like percentiles work. They thought I can get like a 99th percentile from every instance of my
microservice. And then I averaged them all to get the 99th percentile of my whole microservice.
No, that doesn't work that way and people don't believe it.
I actually crafted an example with real numbers on the slide
where you could see that it's completely different.
Very counterintuitive and the mathematicians were
preaching on the mountain all the time, but
it took a long time until people realized,
especially if you want to aggregate many, many tasks of a service, which is this typical
microservice case, distributed systems case, you essentially need histograms or some kind
of what people call digest, just kind of a compressed histogram, if you want.
And at least Prometheus, the whole thing how we did in
Prometheus that worked. It just had those problems that resolution was really low. So if you actually
wanted to calculate a quantile, the precision could be really bad. And it was quite expensive
because every bucket was its own time series, which is kind of the heavyweight element in the Prometheus TSTB.
So you could do, if your bucket boundaries were appropriate, you could do nicely aptX
score, SLO calculations, all those things work.
But if you wanted high precision quantile estimates, that was bad.
You also had to pick the right
bucket boundaries. If you realized at some point
we want different boundaries,
that was really painful because
changing those boundaries
that might mean
you can now not aggregate
anymore.
So a bunch of
yeah,
what do I call it?
PETA, right?
I don't say it.
Pain in the neck.
That's kind of, I think, the civilized way.
Yeah.
So that means, Bjorn, just quickly to recap,
if I understand this also correctly,
that means in the classical histograms, as you call them,
you as the person that wanted to expose histogram data,
you had to define basically what are your buckets.
And basically you made this based on your assumption of what good buckets would be for that particular type of metric.
Exactly right.
So a good signal is if your SLO or SLA, let's call it a real SLA,
you have an SLA with your customer that says,
we will serve 99% of requests within 100 milliseconds.
Then you know you want a bucket bomb at 100 milliseconds.
Great, right?
But if that SLA changes next month to 80 milliseconds, not so good.
And also because of that cost at SoundCloud, we had a lot of three-bucket histograms,
which is kind of not really what you want resolution-wise.
And because of the cost,
you also would be very judicious with labels.
And that's like you would, let's say you have an HTTP server,
you want to partition by status code, endpoint, method,
all those things, right? And then every partitioning is already a cardinality problem,
the usual thing. But now you have, let's say, 10 buckets in your histogram, that increases the
problem by an order of magnitude. So people were really like, okay, we just have an counter for all
those individual metrics, but we just have one big histogram.
But then later you want to know,
okay, is this latency perhaps only happening in the 404s
or only happening for this host on that endpoint?
And then you can't slice and dice,
which is completely against the Prometheus philosophy, essentially.
I mean, first of all, you have to know during instrumentation
what are interesting latencies.
That's against the philosophy.
And then you cannot partition, at least not as freely as you usually could.
I mean, you can never freely partition because you always have a cardinality problem.
But it's even worse with the classic histograms.
And it's also, I mean, I think that the big challenge is, if I can get this correctly,
is that you have to ask your engineers to actually put in these boundaries in their code.
And there's no separation, not a good separation of concerns
on the type of data that you want to collect,
and then what you enforce on the data.
Yeah, we have this.
I have a talk at a meetup somewhere, which is called
Prometheus Proverbs, like the Go Proverbs that you might know, and a colleague made
it into Zen of Prometheus, I think.
He even created a website, we might, that's Kemal's website, we might link that maybe
also.
I'll take some notes here as well, to make sure we, so the Proverbs of Prometheus or
the Zen of Prometheus.
The website is called Zen of Prometheus and it has more than
just in my talk. I was like trying to act
as Rob Pike. But one of the proverbs made up
for, I mean, some were made up by me, some were made up by others in the community,
but one was instrument first, ask questions later.
That was the whole idea of you just put a metric.
Metrics are relatively cheap. They're much cheaper than other observability signals.
So as long as you don't fall into cardinality trap, you are pretty free of adding metrics
everywhere. And then the idea is that you don't put assumptions in while you instrument.
This is why you use counters and not gauges,
because a gauge would be requests per second already.
But if you just count requests, you can later decide,
do I want to average over the last 10 seconds or the last 10 minutes?
All those things.
And similar with histograms,
why do I have to decide what latencies are interesting
and what
resolution I want? I mean, all those things, that's not what we wanted. It's against this
proverb. And that's why I knew already in 2015 we need to do something else, but it took a long time.
And one reason is that it's essentially, it's not just fitting well into the existing execution model and data model.
It has to, it required a lot of changes throughout the stack.
And that's why it took so many years.
I mean, not that I worked on this all the time.
There were many other things I had to work on.
But like in recent years, that was my usual topic I worked on.
So then, how did you solve the problem?
I mean, what's the situation now after you learn from the classical histograms,
what worked and what didn't work?
What's the new histograms?
So from Prometheus' point of view,
the new thing is that we now have a new metric sample type, like Prometheus
and PromQL, the execution, the query language, was always
strictly typed or statically typed, as in there are counters
and gauges and stuff like that. But the
value type was always this infamous floating point number,
which was a deliberate decision and simplified many things.
Also, not having an integer.
Some people freak out because there are no integers,
which is another discussion.
But now we have this Prometheus histogram data type.
I should always say, in text, I usually capitalize histogram
when I mean the Prometheus data type,
and I have lowercase histogram when I mean the statistical concept.
In a podcast, it's hard to distinguish.
Okay, so we have this new data type or value type of histogram.
So now a sample is not just a timestamp floating point number, it's a timestamp big blob, essentially,
which is all the components of a histogram, all the buckets, and this
sum and this cone that
existed before as well, a separate series.
And most importantly, with
this concept, we have just one time
series per histogram, and if a
bucket isn't populated, it just
doesn't exist. The whole
blob also contains
where are those buckets located, where are
gaps, which is the
reason why sometimes they were called sparse histograms. We gave up on that because that's
just one of the properties. With this idea, we solved this additional cardinality explosion
at the price of having a more complex data structure to
handle.
But the most important part here is really that with the classic histograms, you define
a bucket schema and then every bucket, even if it's never used, creates a time series.
And now we essentially only use storage, work, whatever, resources,
when the bucket actually contains some data.
And I know you said, Bjorn,
it's always a little tough to explain these things
just on audio track on a podcast.
There's some great sessions
from different conferences out there.
One of them, I think it's called Native Histograms
in Prometheus.
It's a talk from PromCon in Munich from 2022. Your colleague, Ganesh, I think was one of
the presenters and you followed him afterwards with a demo. But I think that's, folks, if
you want to visualize, because Ganesh did a really good job in kind of having visuals,
right? What's the classical histograms?
What are the new native histograms?
And how does the bucketing work?
So it's a great...
Exactly.
And he uses nice cartoon graphics for that
because another aspect...
I mean, there are so many new aspects, right?
Another aspect is that you can essentially change the resolution on the fly.
And we have a smart schema for doing this
by essentially just cutting buckets into two.
So if you go one resolution higher,
you just half all the bucket width and so on.
So we have this weird two to the power of two to the power of n.
It's kind of the formula behind it,
how much a bucket grows from one to the next.
So it's a lot of mathematics.
It's nicely explained by Ganesh in this talk.
And the result is that you can essentially pick any resolution.
You can tweak it up or down,
depending on how much resources you are willing to invest.
And you can still
aggregate in, you would essentially meet on the lowest resolution if you aggregate different
histograms from somewhere else in time or space, as I like to call it. So it could be
something from the past where you had a lower resolution, but it would still work. Or you
have like just coming from a different instance in your microservice universe,
you can still aggregate in all directions forever, essentially.
As long as you keep this specific way of cutting buckets,
right now in the implementation, there is also no other way.
So you will always use the right way.
The only little downside here is that you cannot just say,
I want a bucket at 100 milliseconds,
because now you have to follow that schema,
but the resolution is so incredibly high
that you can have a bucket at like 99 point something, something.
So it's very close.
In the future, we are planning custom bucket layouts,
so you could actually do this again.
Right now, I would say just use the classic histograms
if you have a clear idea of where your bucket boundary is.
But that would also break this promise
of permanent aggregability or whatever the word is.
But that's another big deal with the classic histograms
that you never have to configure buckets again,
and you can never have wrong buckets that don't aggregate.
You just say, I want a resolution of approximately 10% growth
from one bucket to the next, which is already...
Usually you have double bucket size in the classic histogram,
so it's kind of a huge jump in resolution that is now feasible.
And then that's the only thing you pick.
And then you're essentially done with your instrumentation.
And that's very cool.
So that means, Bjorn, again, I always try to,
if I understand this correctly, I want to tell it back to you.
It means if I am a developer
and I want to have certain metrics as a histogram,
then my client library is basically,
I'll tell the client library what the granularity is,
like what's the resolution,
and then the client library figures out,
based on the data that comes in,
what this actually means in terms of bucket sizes.
And this may also change, right?
Because as the data changes,
the buckets will then potentially change.
And now when you talk about aggregation
across multiple entities, obviously this would then
happen on the Prometheus server when you're executing your queries. Then this is where the
aggregation happens and it goes to the lowest resolution and then gives you the result.
Is this right? Did I get this right? Yes, yes, precisely. And it's all baked into the new version of PromQL.
I was initially, a couple of years ago,
I was almost certain that this would require a major release of Prometheus.
But then we figured out a way of not having any breaking changes.
So now you can even handle classic histograms
and native histograms that could be both in your Prometheus server.
I mean, you cannot mix and match them,
but the queries look slightly different.
That's in the other PromCon talk,
how the queries now look like.
But they kind of look even simpler
because the old queries for classic histogram,
they kind of had to take into account
that you actually have a bunch of series
and they just happen to be the buckets of a histogram.
And now you have a histogram series and you apply a function to it and it's doing the right
thing. And do you think it could also be potentially become an issue as it's so easy now to use them
that everybody will just start using histograms and therefore, I don't know, you may run into
a scalability issue, into a performance issue.
I mean, I assume you've done some good performance testing on this as well, even though you said
there's a lot of optimization that went into because you don't store buckets that don't
exist.
But how about the scalability aspect and performance aspect of all this?
Yeah, so there is, I mean,
you have concepts, right, and ideas,
and it should work.
So in one of my past talks, I created a wish list,
like what works,
what doesn't work super well
with the classic histogram,
what should work much better
with the new native histograms.
And all those wishes became true, essentially.
And the last wish was
this all should happen at a lower cost.
And ideally, so that I can finally attach labels to my histograms,
partition my histograms at will,
and don't have to worry about the humongous cost of that.
And I kind of marked this as maybe back then.
And now at Observability Day,
which is one of the zero-day events at KubeCon, now the recent one
in Amsterdam, I essentially gave fresh results
of the press about real production use cases of that.
And the bottom line is essentially you get 10 times
the resolution for half the price.
It's so hard to compare
because they're so different.
And especially I compared the use case.
We had one framework, essentially.
It's Weaveworks Common.
It's like maybe some people use this as well.
It's like Weaveworks is also like
an important player in the cloud native space.
And we use an open source framework of theirs for microservices,
which is already instrumented.
There are also very early Prometheus adapters.
And this is a framework we use for many microservices at Grafana.
And it gives you an HTTP server that is instrumented with classic histograms
that are actually partitioned by all those labels you want.
It's a very expensive histogram.
And I switched this over to a native histogram
with 10x the resolution.
And then put it out in the wild and see what happens.
And the good thing is, this talk goes into all the details.
We should probably also have it in the show notes.
But the bottom line
here is, for one,
especially if you do all this
partitioning, you get a lot
of sparse
pocket populations.
Intuitively, it's easy to understand.
Your 404s will probably
all have a very similar response
time. You essentially just have to find out
this is an endpoint that doesn't exist
and throw like a 404 arrow back
and it takes one millisecond, mostly, right?
So your histogram for the 404s
will just have a few buckets
populated around one millisecond,
while your 200s will maybe have a spread
because you have different workloads.
But maybe then again, a certain endpoint will have a typical latency.
So you get fewer and fewer populated buckets,
the more fine granular you partition your histogram.
And with the new native histograms, that means less effort to store that.
And then you can partition because you have not a sublinear growth in cost.
And the outcome was in the end that essentially all the buckets you have
with this original histogram, super low resolution but partitioned,
it's about the same amount of buckets,
then populated bucket with the native histogram,
same partitioning but 10x resolution.
And then it's stored not in individual time series,
which gives you another
lever of reducing
cost. And this is where, in the end,
it's like 10x the resolution for half
the price. That's where it's coming from.
But the good thing is
if you run into problems with
this is too expensive, you just say,
okay, let's use a lower
resolution. Nothing breaks. You have lower resolution,
of course, but you can still aggregate
everything. It's not painful to change
that, and it's very easy to
adjust to your desired resource
cost.
One more question on
the performance testing.
I know you said you just then turned
it on and see what was happening.
It's kind of like testing in production.
But did you build any internal testing tools
for that in the beginning to just create a lot of data
and to see how things react?
I mean, very early, that was my starting point.
And that's also an important story,
an important question that people ask.
Those concepts we are using here are not very new.
This whole idea of having a sparse histogram,
there are so many implementations for that.
There have been metrics vendors that have been offering this for a while.
Why is Prometheus so much behind?
And one of the reasons was that
with the conventional view,
for example, especially if you have a vendor that just collects your metrics,
you kind of collect a histogram for a minute and then you package it up, send it to your vendor, and then you start anew.
So you always have this clean slate after every minute.
And my fear was in Prometheus that you have to collect the data essentially permanently
because anytime somebody can come along and scrape
at whatever interval they want,
it's also called stateless scraping, right?
So you can never say, okay, this histogram has been scraped,
I can erase it now.
And the important result was that collecting a histogram for a minute
often already fills a lot of buckets
and not many more buckets get filled if you
collect for an hour i call this like entropy accumulation and that was a very early experiment
i did on real life data like i just looked at latency data from our production systems
and then i didn't like didn't even use like a prometheus i just collected the data and did
some math and number crunching on it to find out,
okay, if I now have this pocketing schema, which pocket will be populated for how long?
And then I found out that we have this nature that a lot of pockets don't get populated
and that latency is usually like it's obviously not randomly distributed.
And you go into some entropy saturation pretty quickly.
And after an hour, if you want to, you can still reset a histogram, even in the Prometheus world.
It's a counter reset, as we call it.
And if that doesn't happen too often, you don't lose too much data.
And that was the initial breakthrough where I realized we can use existing concepts and can have those kind of histograms in Prometheus.
Fascinating and I mean I'm so glad that I also watched all of these talks
earlier because you know it's really good to have a visual in your head what
your colleague was presenting and also what you presented. Brian, histograms, it's a big
topic that we also hear, right? All the percentile values from a Dynatrace site.
Yes, yeah, yeah.
What I would like to do is I need to get this recording to some of our engineers that
have basically built this type of support into our product. I know we're also working on proper histogram support
for Prometheus data as well because we are scraping.
We also understand Prometheus and we can ingest Prometheus.
But it's just really fascinating to hear
what thought went into this.
If you really sit down, this is for me the great thing
and what gives, I think, a lot of people confidence
in the whole thing.
And that's why it's so popular right and as as you say in the art of the zen of prometheus one does not simply use histograms and i think you just really um exemplified why because if there
is that much thought and i don't want to say complexity in a negative way but it's it's a very advanced and
robust concept that people probably take for granted um yeah it's definitely something that
thought has to be put into obviously you all put tons of thought into it uh which is which is really
amazing and then hopefully people like us can benefit from grabbing that data
and letting the users use it to make everything better.
To make the world a better place, as we like to say.
Exactly.
And Bernd, I guess what you said as well,
in small, like in demo settings,
any normal maybe implementation would also do.
But the real problem comes in as you scale,
as you scale your dimensions,
as you have more data in the dimensions that SoundCloud
and also the other big players are now using Prometheus,
you really need to have a very efficient approach to histograms.
And efficient means it starts with storing the
right data and don't store data that is meaningless. And then it's what I... Thank you so much for the
enlightenment. This was really, really, really well. Björn, what's next? Are you done now?
Can you finally retire or what's the next big topic? I was just about to say we're not even done with the histograms.
The sad news is that the full
instrumentation library support is currently only in the Go client.
And there it's really like,
I mean, Brian alluded to that, it's so simple to do it.
You essentially say, give me a 10% resolution
like bucket-to-buck bucket growth, and then there's your
native histogram. You don't have to think about all the thoughts that we put into it, and it just
works. Java, the Java instrumentation library has like preliminary support. And the huge block,
roadblock here is that this histogram representation works really nice with Protobuf, which is the reason why we kind of resurrected the Protobuf script format,
which was already declared dead.
The secret history of histograms talk goes into detail there, how that happened.
And we want to create a text representation for the native histograms as well.
But that's a hard knot to crack
and people are working on that right now.
And that would unblock clients
that have never touched Protobuf,
like Python or the Ruby client,
and also make it really simple
for third-party providers
of instrumentation libraries so
that it's everywhere. But right now,
everyone is probably pumped now to try it out. But if you're not using Go, you so that it's everywhere. But right now, everyone is probably pumped now and want to try it out.
But if you're not using Go,
you cannot try it out easily.
That's the sad news, and that still has to be done.
So everybody rewrite all your code in Go,
re-architect your entire organization for Go,
and the excuse will be just so you can get the histograms.
And here we go.
The business case.
Here we go.
There you go.
Hey, now we can't stop saying Go.
Oh, brother. Yeah, and I just have to ask, and here we go business case here we go there you go hey now we can't stop saying go oh brother
um yeah and i just have to ask you know on the back end prometheus side is prometheus being used to monitor prometheus it's always yeah yeah of course yeah of course yes of course that's the
answer yeah that was one of the insights i mean now it's kind of, everyone talks about that
or has already realized that.
But let's say 10 years ago, that was still a big deal,
talking about developers that never are concerned
with any kind of ops work.
And this all, I don't even would call it like shift left or shift right.
It's just like the kind of task you do become more similar.
I became an SRE in 2006
when nobody knew what that is. But the whole idea of using
essentially a software engineering approach to operational
functions, that was already very, very important
in our very complex world we have now.
I mean, back then, big internet giants had this, and now everyone has this problem.
But also for developers, that they say, okay, I have to be concerned about instrumenting my code,
and I can actually use it in my debug cycles.
If you have instrumented your code with all the signals that are out there,
you can use it to optimize your software, to debug your software.
And of course, this was such an, I don't know,
it was such an enlightenment in a way to talk about the fire again,
that you link in the Prometheus instrumentation library.
You don't even instrument a single thing,
but it gives you all the runtime metrics.
It gives you process
metrics. For Go
binary, you get Go runtime metrics. For Java
binary, you get Java runtime metrics.
And just having this all the time,
at any time you can look,
okay, what's my heap size, like all those
things, and you have
hopefully collected this over time.
This is so valuable, even for the development
process and optimization and everything.
And now, of course, we do this with other signals,
like continuous profiling becomes a thing now.
And yeah, I mean, all of that helps during development.
And so it's not even shift left or shift right.
It's just like shift everywhere.
And of course, Prometheus,
we instrumented Prometheus with Prometheus from the beginning, and it was super valuable.
But also, Go is coming from Google as well,
and they had these insights
from the beginning. So Go comes with the
built-in profiling endpoints and really good debug
tooling, profiling tooling. And of course,
that helped us so much for optimizing Prometheus itself, which is, of course, not distributed
tracing. So it's not as exciting for you, I guess. But it's kind of tracing, if you want,
and helps a lot. And Prometheus itself is just a single binary server.
And it's kind of simple on purpose,
which doesn't cover if vendors offer
like implement the Prometheus API,
then of course they have distributed systems
and then they start to get into
all the nice additional complications.
And of course, Prometheus is also instrumented
with OTEL tracing, right? So if you want that for Prometheus is also instrumented with OTEL tracing so if you want that
for Prometheus alone
it might make sense in some situations
to do this but also if you just use
the code
and you link it into your implementation
of a distributed Prometheus
it's good that it's all there
and you use all those signals
and it's a perfect full circle
right? Hey Bjorn that it's all there and you use all those signals and it's a perfect full circle.
Bjorn,
after 10 years of
Prometheus and I'm pretty sure
you are still excited about histograms
and there's still so much stuff to do.
I assume this is not going to be
the only podcast we do with you.
At least I hope so. It's not going to be the only one, but we will do more with you
as new things come up that are relevant for our listeners,
because I think this is extremely relevant.
And I think from multiple angles.
The one angle is because most of the people that we interact with,
Brian and I, Prometheus is just there.
And so it's for us great to learn more about it,
understand it better.
But also what for me personally was very interesting,
just the performance aspect of Prometheus itself.
That's also interesting because we have a lot of listeners,
I believe, that are or at least have a background
in performance engineering.
And this is why also thanks for giving us
a little bit of insights
there as well.
Yeah.
But with this,
I don't know,
did we miss anything?
Anything beyond
that you need to get
off your chest
that you think
this is something
you need to say?
Maybe you should
invite my colleague
Brian Bohem
who has done
a lot of
optimization PRs
recently.
Like every
Prometheus minor
release had another X percent CPU or memory decrease
because he kind of did the profiling dance
and found another thing
and is also a really small person
to find those things and make them better.
And then you realize,
oh my gosh, we wasted so much memory all the time
because we didn't write the proper code.
But yeah, that's how software engineering works.
I was going to say,
let's stay in the obvious of what people often miss
is that, oh, we're using so many resources
because we didn't write the proper code.
Like, yes, that's the source of so much of our business.
That's an amazing observation.
All right, Brian, should we bring it home?
Let's bring it home, Andy. Um, did I call you handy?
I said home Andy and in my head I heard handy.
I'm going to start calling you handy now. Um, you're,
you're a very handy person to have around um see you're valuable andy um
more so than i am on this podcast except for i think we always have different episodes right
where where one of us is just more i know i was talking a lot today because oh yeah this is this
was a lot more it was so it was so it's still fresh in my head because i watched the documentary
this morning yeah i think it's amazing there's a documentary about a technology that's not
about the numbers and words or the deep side of it.
I don't know if that's a first.
I mean, obviously there's histories of Windows and Microsoft and stuff like that, but for
there to be a documentary about something like Prometheus is quite amazing.
I think the other big takeaway for me, at least today was that um maybe people don't have to go as deep as bjorn and team did on things like histograms but
i think it's important for people to understand the metrics they're using right because when you
you you talked about you know histograms and you can't just you know take an average of the
percentiles or the
percentile of the percentiles, however it was you were explaining in the beginning when you first
started talking about the histograms, if you don't know what's behind the metric you're using,
how can you properly use it? So I think that's an important thing. And that even came to light way
back in our early days, Andy, when everything When everything was averages and then people were like, yeah,
you should really be looking at like maybe a 50th and 90th percentile.
Cause that's, you know, your average can be wildly wrong. And then like,
oh wow. Now that I think about it,
which it never took the time to before it makes total sense.
So just, just proves that point some more. So anyway.
Yeah. Thank you Bjorn so much for being on and Andy, thanks for arranging
this. I hope everybody got something out of this one today. It was amazing to have someone who
started Puma Reader, well, I guess, jumped on board very beginning. I don't want to give you
credit because you don't want the credit of starting it, but you were right there at the
beginning, we'll say, and I won't use your title. I don't want to put onus on you.
But yeah, someone who was there from basically the beginning, who's a main
contributor. And let's just call you responsible for all of Prometheus.
Let's just build you up. Fantastic having you on.
What? And the internet. And the internet, yes. And the internet.
You gave Al Gore the idea.
So it's an amazing honor to have you on.
Thank you so very much.
And we hope everyone enjoyed it.
And look forward to having you on again.
And thanks for everyone for listening.
Great.
Thank you very much.
Thank you.
Bye-bye.
Bye-bye.