The Changelog: Software Development, Open Source - Gerhard goes to KubeCon (part 2) (Interview)

Starting point is 00:00:00 Bandwidth for ChangeLog is provided by Fastly. Learn more at Fastly.com. We move fast and fix things here at ChangeLog because of Rollbar. Check them out at Rollbar.com. And we're hosted on Linode cloud servers. Head to Linode.com slash ChangeLog. This episode is brought to you by DigitalOcean. DigitalOcean is the simplest cloud platform for developers and teams

Starting point is 00:00:22 with products like Droplets, Sp droplets spaces kubernetes load balancers block storage and pre-built one-click apps you can deploy manage and scale cloud applications faster and more efficiently on digital ocean whether you're running one virtual machine or 10 000 digital ocean makes managing your infrastructure way too easy head to do.co changelog again do.co slash changelog. Again, do.co slash changelog. Welcome back, everyone. This is The Changelog, a podcast featuring the hackers,

Starting point is 00:00:58 the leaders, and the innovators in software. I'm Jared Santo, managing editor of Changelog Media. I hope you're enjoying these last few days of 2019. This is our final episode of the year. software. I'm Jared Santo, managing editor of Changelog Media. I hope you're enjoying these last few days of 2019. This is our final episode of the year. Gerhard is back for part two of our interview series from KubeCon. Join him for some deep, lengthy conversations on Prometheus,

Starting point is 00:01:17 Grafana, and Crossplane. Oh, and one last note before I pass the mic. If there's an interesting topic or a great guest that you would like to hear on the show, let us know at changelog.com slash request. We would love to hear from you. That's it. Enjoy. Today we have around this square table, rectangular table. We have Bjorn from Grafana. We have Fred from Rat Hat and we have Ben from GitLab. All of them are Prometheus contributors. So this is going to be a technical discussion. We're going to mention a lot about cool things about Prometheus and who would like to get us started? Sure, I'm Ben. I'm a Site Reliability Engineer at GitLab. I've been contributing to the project for quite a number of years now. My focus is on getting developers and other systems to integrate with Prometheus. So I don't work on the core code so much,

Starting point is 00:02:13 but I try and help people get their data into Prometheus and then learn how to actually turn that into monitoring. Sure, I go. All right, my name is Bjorn. I work at Grafana, but that's quite recent. I now, fortunately enough, to kind of be a full-time Promethean. So my company pays me to contribute to the project, and I also do internal Prometheus-related things. Previously, until like half a year ago, I was at SoundCloud, where Prometheus had its cradle, as I like to say it. And there we kind of had other jobs.

Starting point is 00:02:47 We were like production engineers or site reliability engineers or something. Ben was also there. And we had to create Prometheus for doing our job as a tool. But it was always like a site business in a way. It sounds kind of weird now that it's so popular. Yeah, I'm Frederik. I am an architect at Red Hat. that it's so popular. Yeah, I'm Frederik. I'm an architect

Starting point is 00:03:07 at Red Hat. I'm basically the architect for everything observability. And I happen to have started with Prometheus in that space roughly three and a half years ago. Even though it's been three and a half years, I think I'm the most recent at this table to

Starting point is 00:03:24 have joined the Prometheus project. Yeah. And one thing which I'd like to add is that this year, for the top contributor in the cloud native landscape, the award went to Fred, right? So I think Bjorn, you were mentioning earlier that Prometheus, the contributors got awards awards in a row. Every single year, one of the Prometheus contributors got some sort of an award.

Starting point is 00:03:52 There's like a streak going on here. Is that right? You might think it's like a political thing that we have to get an award, but I think we really have a bunch of awesome people. I think Prometheus, looking at how it grew, right? Everybody's looking at Kubernetes and everybody knows Kubernetes. But Prometheus is also a graduated project in the CNCF. And a lot of activities happening around Prometheus, around observability, around metrics. I find that super interesting because it's not just about the platform. It's also all the other tooling that goes in the platform.

Starting point is 00:04:24 And Prometheus is one of the shining stars of the CNCF. We were the second graduated project. There we go. We almost graduated first but yeah I guess but Kubernetes had to take that. They are also much bigger projects so there was way more effort for us. It's kind of easy to graduate but interestingly I had this I did this for a talk recently where I thought graduated does that mean we're done like it's kind of easy to graduate. But interestingly, I did this for a talk recently where I thought, graduated, does that mean we're done? It's kind of stabilized. We just get maintenance PRs.

Starting point is 00:04:51 And CNCF has this DevStats tool. It's a Grafana dashboard, shameless plug, where they can plot. They just evaluate activities among companies, among contributors. And you can just draw graphs. how actively is this project contributed to and if you look at the Prometheus graph it looks like from the moment of graduation you actually got more activity like it's it's probably like smaller things that are not so visible but a lot is going on in the Prometheus ecosystem. Right and you only just had PromCon not long ago how was that like two weeks ago one week ago

Starting point is 00:05:27 it was very recent uh yeah that was the the second week of november uh it was great um we it's a very small community gathering uh we're actually sad this year we wanted to expand the size of it uh but we just couldn't get the venue big enough that would uh was available when we needed it so uh yeah it's a small 220 person conference and it's uh all talks about uh prometheus and and development of what's going on and uh people's stories and how they uh they uh use prometheus tickets were highly sought after it felt like a rock concert yes and i think even our live stream was um well visited right yeah we had um uh i think we peaked at something about 80 people on the live stream was a little

Starting point is 00:06:18 unreliable this year but uh we'll hopefully do better next time all the talks will be we'll get proper recordings on the website. Yes. Everybody can watch that. I think what's super exciting about PromCon, I believe all of us have been at every official PromCon. I think there was one unofficial. Oh, no, you.

Starting point is 00:06:37 I was at the first unofficial PromCon Zero. Okay. You were too, right? It was at SoundCloud most. I mean, that was. We called it PromCon when developers came together to prepare the 1.0 release. But then the real PromCon happened.

Starting point is 00:06:51 I was the first. The most recent one. I think what's really interesting about how PromCon has evolved over the last couple of years is that in the first two to three years, I think it was very, very Prometheus development focused.

Starting point is 00:07:11 And this year, last year also already, we've seen this a lot, that I think the entire community is kind of evolving that Prometheus is a very stable project, and we're now more demonstrating how it can be used in extremely powerful ways and i think that kind of reflects um in some way the graduated status is why i think like because people can rely on it um that's we're seeing all this adoption that is just incredible i think also like how this ecosystem doesn't have like a strict boundary you have lots of projects that are not prometheus projects but they are closely related and there are loads of integration points

Starting point is 00:07:52 it's open source it's open community and i think that that really works well one thing which i really liked about prometheus is this emerging standard of open metrics. So it's less about a specific product, it's more about a standard which people and vendors are starting to agree on. And I think that is such an important moment when you have all these companies saying, you know what, Prometheus is onto something. So how about we stop calling the exposition format that, we start calling it Open Metrics. Did you have any involvement with that? Yeah, so I'm one of the people that started the Open Metrics project. And, you know, as a site reliability engineer, I'm working with my developers to instrument their code and make it so that I can monitor it.

Starting point is 00:08:42 And I also have to work with a lot of vendor code. And for a long, long time, the only real proper standard is SNMP. But SNMP for a modern developer is extremely clunky and really hard to use. And it's not cloud native, if we want to use the buzzword and as as an sre i don't actually care if vendors use prometheus but we need open metrics as a

Starting point is 00:09:16 modern standard to replace snmp as the transport protocol of metric data. And I really like how the metrics, so open metrics, open telemetry, which is a combination of open census and open tracing. Thank you very much, Fred. So the combination of these two, how does open metrics fit into open telemetry? So open telemetry is it so open telemetry because it comes from the

Starting point is 00:09:46 the open tracing and open census open census was this idea of creating a standard instrumentation library that handles both the tracing and the metrics and some of the logging pieces and this is a really great idea especially especially like, you know, from when I'm wearing my SRE hat, is that you have a standard library for instrumenting your code. And the OpenMetrics is just the way you can get, or is what I think should be, is the way you get the metric data out of OpenTelemetry.

Starting point is 00:10:24 And so it's just kind of the standardized interface because the tracing interface is kind of still young and fast-moving and it hasn't settled down. But the Prometheus in the open metric standard is something that we want to see last for as long as SNMP has lasted. SNMP has been around since the early 90s, and it hasn't changed much, and the data model is actually quite good

Starting point is 00:10:51 with being clunky and a little bit designed around 16-bit CPUs and things like that. But we want to see the Open Metrics transport format be this long-term, stable thing that vendors can rely on so we have metrics the story is really good we have traces and the story distributed tracing is really good as well where are logs or events as some like to call them. Where do they fit in in this model? And I'm looking at Bjorn because I know that Loki is this like up-and-coming project. We'll be talking later with Tom about Loki and there's, I forget

Starting point is 00:11:37 his name, but he's the maintainer of Loki or the head behind Loki as Tom got him. Actually, we have a bunch of people I can find working on Loki. It's like a big deal, obviously. But I don't even feel like I would do them justice if I know to tell them. You should probably ask later. I mean, the purpose, you should take it from the other way, that Prometheus is often like people see Prometheus, they realize it's like this hot thing that they should use. They see all the success they have and then they try to shoehorn all their like observability use cases into Prometheus.

Starting point is 00:12:15 And then they start to use Prometheus for event logging. And Prometheus is a really bad event logging system. And that's a lot where we have to fight fight kind of whatever we have to convince people that they shouldn't do this even if they're angry at us but then there's also like the other whatever the backlash for like the logs processing people try to to solve everything and yeah i mean we we kind of have more this inclusive picture that you need all those tools, you need to combine them nicely. And Loki has this idea where you take some parts of Prometheus, which is like services gallery and labeling, and use the exact same thing for logs collection.

Starting point is 00:12:58 And then it's easy to collect the dots and jump from an alert with certain labels into the appropriate logs that you have collected. It goes into that direction. But I guess you will talk a lot about that with Tom. Yes. Actually, I'm a strong believer of connecting different signals via metadata. Actually, Tom and I did a keynote at KubeCon Barcelona about exactly this topic. So I highly recommend people checking that out.

Starting point is 00:13:24 Okay. Are the videos out yet from Barcelona? Yeah. Are they? Cool. It's not only him recommending himself. I recommend that as well. Right.

Starting point is 00:13:33 Okay. Yeah. And from the Prometheus project perspective, I see it as with Prometheus, we have a very specific focus and we kind of follow a bit of the Unix philosophy of, I want a tool as an engineer i want a tool that does one thing and one thing well and you know i look at some of these large monitoring platform things and i see a lot of vendors they they they they also combine

Starting point is 00:14:00 monitoring and management into the same platform and And with Prometheus, we explicitly don't have any kind of management. We even don't even have any templating in our configuration file, because different organizations have completely different ideas on what they want for their configuration management to look like. You know. You have things like Kubernetes and config maps and operators and that, and then you might have another organization that they're doing everything with a templating configuration management like Chef or Ansible or one of those. And so the layering approach to observability is really, really important to me because I want a really good logging system and I really want a really good metric system.

Starting point is 00:14:55 And I really want a good tracing analysis system and crash dump controls and profiles. And to me, those are all different pieces of software and I need to combine them. And there's no one magic solution that's going to solve all my problems all at once. So I can see this idea of the building blocks and having the right building blocks, right being a very relative term in this context,

Starting point is 00:15:20 because right to me is different than right to you. So this choice of selecting whichever building blocks are right for you and combining them, again, whichever way is right for you, and then you get this like almost everybody gets what they want and yet the pieces exist that they can be combined in almost infinite ways. So Prometheus has grown a lot. Prometheus is like on a crazy trajectory right now from where I'm standing. And I would like to zoom in a little bit in a shorter time span.

Starting point is 00:15:55 So, for example, the last six months, just to get a better appreciation of all the change that is happening in Prometheus. Let's focus on the last six months, the big items that have been delivered and the impact they had on the project. We should also say there's so many, we call it project, a repository in the Prometheus GitHub org and there are many projects like Alert Manager is probably something very famous, NodeExporter is pretty active and big and all those things. But every project has new stuff going on. And I think we should restrain ourselves to just the Prometheus server itself because otherwise we could chat forever about all the new things.

Starting point is 00:16:36 Yeah, and actually a few of us have been discussing that the Prometheus Prometheus core code is really, really reasonably feature complete. And it's not actually moving that fast. We have lots of small changes that are still important. But the speed of the project is actually how many additional things that are connected to Prometheus that is expanding. There's a large momentum about things that are being built around Prometheus that is expanding. There's a large momentum about things that are being built around Prometheus while Prometheus itself is largely stabilizing and optimizing.

Starting point is 00:17:14 Yeah. And then, yeah, I mean, should we talk about something new? Of course, now that you say stuff around Prometheus, it was always a very hot topic that Prometheus doesn't have this idea of having a distributed clustered storage engine built in. And we always said that's somebody else's problem. And then we provided an, I can still experimental the interface, right? Officially. Officially, yes, but it works.

Starting point is 00:17:44 Yeah. So we created this kind of experimental write interface, and now we have dozens of vendors or open source projects that integrate against this interface where Prometheus can send out the metrics it has collected to something out there. And this has seen a lot of improvements recently. I don't know. Does one of you want to talk about details?

Starting point is 00:18:06 Actually, even commercial vendors, monitoring platform vendors, are starting to accept Prometheus Remote Write as a way to get data into their observability stack. I don't think any of us actually worked on these improvements, but I think the most notable thing that happened in remote write was previously remote write, whenever Prometheus scraped any samples, it immediately queued them up and tried to send them to the remote storage. And this had various problems, one of which is we really just keep all these samples in memory until we send them off. And so one of the dangers was if the remote storage was down, we would continue to queue up all of this data in memory and potentially cause out-of-memoryts, for example. And so kind of the solution to this was Prometheus has a write-ahead log

Starting point is 00:19:08 where the most recent data is written to before it gets flushed into an immutable block of data. And so instead of doing all of this in memory, basically we use the write-ahead log as a kind of persistent on-disk buffer and that write-ahead log as a kind of persistent on-disk buffer and that writeahead log is tailed and then we send the data off based on that so i think this is one of those things the feature actually hasn't changed at all in its functionality it just the implementation itself changed to be

Starting point is 00:19:40 to be a lot more robust than it used to be. And I think that's really exciting and it kind of shows the details that we're starting to focus on in Prometheus. So for all those projects that are being built around Prometheus, it's very important, it's becoming even more important for the core to be more robust, to be more performant,

Starting point is 00:20:00 to be dependable, right? So that it can support all those extension points and all that growth yeah i guess if it's still experimental you should do something about yeah let's see should we talk about the flip side of that the remote read yeah because that is the flip side of it if you have a prometheus server that has stored stuff away into remote storage often those remote storage providers have their own query engine. Sometimes they even support literally PromQL

Starting point is 00:20:28 and you can work on that. But sometimes you just want your Prometheus server to know about that data that has been stowed away somewhere. And there's the flip side of the remote, right? Which is remote read. And that, yeah, I mean, that's also kind of still experimental,

Starting point is 00:20:47 but there was a similar problem. Who wants to take this for the memory? Should I go ahead? It's actually, we're all not the domain experts in that, right? So the problem there was that Prometheus runs a query and then the query engine has to retrieve the data and the API looked like that it would essentially get all the samples that this query had to act on in one go. So the remote backend for that had to construct all those samples in memory on their

Starting point is 00:21:18 side and then send it all over. So Prometheus has to receive it all on its own side it's all there and then um that could have a huge impact on on like memory usage in that moment so we would uh we i mean that concretely happened you would would both parts the back end would build up all this huge amount of samples in memory and then prometheus has to read it and i mean prometheus has a really efficient way of storing time series data in in blocks in its own storage so the idea was to just stream the data like streaming is anyway the hotness right where you it's all the one stream you don't have to all build it up first and then send it out. I think it also reuses the exact block format of Prometheus. Yeah, the big problem with the remote read was that we have all this compressed data on disk and in memory,

Starting point is 00:22:16 and the remote read would decompress it, serialize it, and then send it out over the wire completely uncompressed, and it was using huge amounts of bandwidth. Actually, was it taking it and then snappy compressing it, if I remember correctly? I believe so, yeah. Yeah, so it would take a well-compressed time series block, serialize it, and then recompress it with a generic compression, and this was just kind of silly.

Starting point is 00:22:46 In hindsight, yeah. In hindsight, yes. Yeah, and this doesn't just benefit the Prometheus server itself, but basically this is, again, there are a bunch of integrations around Prometheus that benefit from this. Yeah, but I think Thanos was,

Starting point is 00:23:02 that was a big deal for Thanos, this improvement. Yes, because Thanos essentially sits next to a Prometheus server and uses this API to read raw data from the time series database. And so it was a big deal for this component to have this more efficient way of doing it because Thanos itself had already this streaming approach. So it loaded everything into memory and then sent it off in a streaming approach so now it can actually make use of all of these things so why do you think that this remote uh write and remote read are becoming more and more important these days I mean is something happening with Prometheus is it

Starting point is 00:23:43 getting to a point where this is becoming more important? Why is it an important thing now? As users of Prometheus grow, they grow beyond the capacity of one Prometheus server. And Prometheus was designed uh from a background of distributed systems and uh where where prometheus got its inspiration uh we had hundreds or thousands of monitoring mini nodes and each of these mini nodes would watch one specific task and and keep track of one small piece of the puzzle and as people grow their monitoring needs and they're running into the same exact problems where a single monitoring server is not powerful enough to monitor a whole entire kubernetes cluster with tens of thousands of pods and

Starting point is 00:24:40 multiple clusters are geo-distributed so they they're running into the same problems. And being able to take Prometheus and turn it into just the core of a bigger system means that you need these in-and-out data streams in order to make it the spokes of a full platform. So that's another hint as to the popularity of Prometheus and the use cases for Prometheus which they're like machines they can't they are not big enough to be able to run everything in one machine so again it got to the point where you need more than one and what does that look like so

Starting point is 00:25:23 this is a story in a use case, which is becoming more and more relevant. So there was the remote write, the remote read, important improvements in the last six months. What other things are noteworthy? I mean, it's actually a bit longer ago than six months where we decided we'd go on a strict six-week cadence of releases. Similar to Kubernetes, but they have a longer cadence.

Starting point is 00:25:51 Three months. Go has this similar thing. I mean, personally, my ideal is always you should just release when you have something to release. And in the ideal world, that just works. But in the real world world people just procrastinate and then we had seen this that just nobody was bothering to release a new prometheus server and then we had way too many things piled up so we just said okay every six weeks and should we ever reach this point where we have a new release and nothing interesting has happened we can reconsider that but so far we

Starting point is 00:26:22 have done this now for almost a year i think yeah so we always get a release shepherd nominated ahead of time and then you have this really like you cut a release candidate you like tell the world that they should try it out and then usually you get a fairly stable dot zero release like we are what is the current 2.14 2.14.0 i think we didn't have a work fix release for that one right yep that worked that was during promcon actually where we released that but that was just coincidence because it's a strict six-week cadence right yeah so every time there's something interesting happening and and since yeah so releases go up but we also have this all built into like

Starting point is 00:27:02 benchmarking the the benchmarking tooling, our internal benchmarks, are way better now, and it's all part of the procedure to run benchmarks to see regressions. We had a few of them in the past. Nice, interesting new features, but also, sadly, new feature was everything is a bit slower. So that can't really happen yet. Or it happens in an informed where we say,

Starting point is 00:27:23 okay, now we have, whatever, stainless handling, and we accept that this has a tiny performance penalty. So, yeah. At least we can, because we have all of these tools, we can do these things in a controlled way, right? As opposed to realizing these things after we've already released it and users opening issues.

Starting point is 00:27:44 And one thing that personally for my organization is really cool about the regular release schedule is we know exactly when the next release candidate is going to be cut. So the SRE team can plan cantering these kinds of releases and contribute back with issues and so on and I think that's that's also for us as maintainers really powerful to get more consistent feedback do you see the adoption of new releases their way of seeing what the adoption is

Starting point is 00:28:20 and what I mean by that maybe maybe number of downloads, maybe something that will tell you, okay, the users are upgrading, and they're like running these new releases, is there such a place that you have, maybe it's publicly available? Yeah, there's the, there are counters for looking at how many downloads we get from from the official releases. There's also how many people pull their docker docker images but uh we're not really paying attention to this we um we're we're more focused on development than than uh marketing numbers do we have like github download counters yes okay but it's like we don't we don't we we mostly don't even pay attention to that.

Starting point is 00:29:05 But then also, of course, some organizations wouldn't even download directly from GitHub. They just download it into their own repository, so you can never know. We needed to do some phone home mechanism into Prometheus, and we're not doing that. But Grafana has some maltracking about their installed instances and they also report back the number of like which data source is being used by that grafana instance

Starting point is 00:29:34 and every promcon has a little lightning talk by some grafana person telling us how many grafana instances there are in the world that phone home and how many of them have prometheus as a data source and like the grafana growth is like crazy but the percentage of grafana instances using prometheus is also growing like crazy like it's like the second order of growth and i think this year we hit the more than 50 percent of grafana instances have a Prometheus data source. That's mind-blowing. Okay. So releasing new versions, having the six-week cycle when users can expect a new version to be cut,

Starting point is 00:30:17 a new version to be available. Do you do anything about deprecating old versions or stopping any support for older versions? It's largely on an ad hoc basis. If there is someone who is willing to backport a fix, I think we generally are open to cutting another patch release. Sometimes us as Red Hat, we support older versions in our product, for example, and that's when we do those kinds of things. I don't think we have a set schedule of when we don't support anything anymore, but it generally doesn't happen too often.

Starting point is 00:30:55 It happens. I mean, also, we are on major version 2, and we have a few features listed as experimental that can actually have breaking changes. Breaking changes? It's getting hard. Third day of the conference. Where you could not just seamlessly upgrade, but most features are not experimental. So there's very few reasons for somebody to not go to the next minor release. Sometimes we have like little storage optimizations

Starting point is 00:31:25 where we try, after some problems in the past, where you couldn't go back from. Once you have gone to the higher version and the storage has used the new encoding version internally, the older versions couldn't act on it. And we are now doing things like where you have to switch it on with a flag in the next minor release, and then it becomes default, but you could still switch it off,

Starting point is 00:31:48 and then it becomes the only way of doing it or something. It's very smooth, and I think rarely... I mean, some companies have these very strict procedures to whitelist a new version, but in general, it's happening rarely that someone says, I really still have to run Prometheus 2.12. Could you please have this bug fix release for 2.12? As a matter of fact, I don't remember the last time

Starting point is 00:32:10 we've done anything like this. Yeah, the releases are always upgradable within the major version. So the incremental upgrade is completely seamless. It's just drop in the new version restart and away you go um there's been no real problem with upgrades yeah interestingly so i also work on one of the projects uh that integrate around prometheus called the prometheus operator and we actually test to this day upgrades from prometheus 1.4 i believe up until the latest version so

Starting point is 00:32:48 amazing okay um yep should we find something else to talk about um so there could be we could talk about unit testing rules and alerts. Alert testing is a big deal because I guess it's like a... I have discussed this actually also quite often recently, how you actually make sure that an alert will fire if you actually have an outage. This is a big, arguably not quite solved problem, but at least in Prometheus you can now unit test your rules, recording rules as well as alerting rules.

Starting point is 00:33:25 It's all built in Promptool, this little command line tool that's distributed alongside with the server. And there's a little, kind of a domain-specific language, if you want, to formulate rules that you can write. This is how the time series looks like, and then I want this alert fires in that way,

Starting point is 00:33:43 all those things. I think they have a blog post on the project website. Yeah, do we think we have a... That's pretty cool. Yeah, I think, again, this is one of those things where it shows the maturity of the project and the ecosystem that people don't only care about monitoring and alerting, but they also care about actually testing their alerting rules so we talked

Starting point is 00:34:06 about um the big noteworthy noteworthy initiatives um that have been delivered in the last six months the exciting stuff the most exciting stuff um what about the next six months what do you have on your roadmap things which are worth mentioning i mean we have a roadmap on the website but it's kind of almost obsolete because i think most of the issues or items there have been at least almost been implemented so like i think it's time for getting more into more visionary things but also like there's some things very concretely happening. One thing is probably that will be really visible. It's like a new UI for the Prometheus server.

Starting point is 00:34:52 Some people just use Grafana as their interface for Prometheus, but originally when Prometheus was created, there was no Grafana. We actually had our own little dashboard builder, but Prometheus was really meant to... Why are you laughing? Hey, I'm still a Promdash fan.

Starting point is 00:35:10 Okay, so it has still fans. Stuart will like you now. So whatever. So we want to talk about the future. So the UI on the Prometheus server was always very simplistic, but I totally loved it. It was my daily tool to work with. But yeah, it kind of a bit hasn't aged that well.

Starting point is 00:35:32 So we're replacing our handwritten JavaScript from 2013 or so with a nice new React user interface. And it's now in 2.14, and you can go give it a spin. There's a button to click to try the new UI. Okay. This will give, like, a lot of... I mean, this is essentially at the moment just reconstructing all the features we have,

Starting point is 00:35:55 but this will allow, like, modern stuff, like proper autocompletion and tooltips and all those things. That will be very easy to include. You get a glimpse of it if you do the Grafana X-Pro View. It's a lot of stuff, but that's all very much wired into Grafana. And in the Prometheus UI, we try to get this in a more generic form, and we also want to be able to do like this LSP language server protocol

Starting point is 00:36:27 like which is which is this generic way where like IDEs can inquire from a server what to do with like auto completion and stuff so this could work for the Prometheus UI itself but there's actually an intern at Red Hat like working with Fred Tvias he's working on this, just implementing this LSP for PromQL, and then you can point your VS Code to that, and suddenly you get auto-completion in your editor of writing

Starting point is 00:36:53 rules, and that's so cool. Yes, I'm really excited about that. I'm also really excited to finally get those beautiful help strings at all the metrics output and getting that into the basic user interface because this would help all of the users of Prometheus to be able to see what does this metric name actually mean

Starting point is 00:37:14 and get the extended help information and the explicit types that we have. We have this data in Prometheus and it's been many years and not exposed to the user. As a matter of fact, I saw a demo last week showing exactly that. This was like, I mean, I always

Starting point is 00:37:31 tell the story of Prometheus as it has started with the instrumentation. It's instrumentation first and we always put in there, you had to describe your metrics within help string and you have to tell that it's a counter or gauge and then Prometheus was just not doing anything with that information and that was lasting for way too long and now something is happening

Starting point is 00:37:52 that's actually that actually resonates really well because you're right like a lot of effort goes into describing what the metrics are and then when you consume them just consume them as metrics as values right and then a lot of that information, actually all of that information, gets lost. So I can see a really good opportunity here for maybe Grafana or another UI to make use of that information, to maybe start explaining what the different metrics are, right? As the original authors intended them. And there's a question which I have. I'm wondering how, like what are the limits for describing metrics? When I say limits, I mean, is it like a single string? And is there like a limit of how big that string can be? Can you add any formatting to that string? Because I'm almost thinking Markdown is a bit crazy in hell, but why not? It's like the next step to this i mean that might evolve

Starting point is 00:38:46 when we actually use it but at the moment it's a plain text string with no length restrictions right right yeah you can write i mean it wasn't the help string we had this an incident that's out there where somebody accidentally put a whole like html source code into a label and prometheus could ingest that just fine. It looked really weird when you looked at the metric. But we are usually not implying any fixed limits on anything. Or any formatting, just like plain text.

Starting point is 00:39:16 The formatting, yeah, might evolve, we will see. It's actually interesting, we've had the metadata API through which you can query the help and type information for, I think, about a year and a half now, but just haven't

Starting point is 00:39:32 actually made use of it just yet. I think, as Bjorn started out with the React UI, it's a really cool thing that we can now, with a modern approach, do all of these things and just within julius did the initial initial work for this react-based ui and just within a couple of weeks of having this

Starting point is 00:39:55 entry we've had a tremendous amount of contributions to this because suddenly we've opened up like a pool of engineers that can help us out with these things which was kind of the initial point anyways because nobody was really contributing to the to the old ui and suddenly we're just a couple weeks into it and it's just validated the point that making this more accessible opens a large pool of contributions which Which is a very interesting point in an open-source project. Should you go for something with a known, big base of people who act with that? Like, let's react.

Starting point is 00:40:34 And kind of the competing way was the alert manager I got refurbished a while ago in Elm, which has a way smaller community, but a very committed community. And we had a bunch of committed contributors. And I think they are now obviously not happy that this is happening in React. But I think it's a really tough decision.

Starting point is 00:40:55 You could say it's the same when we started Prometheus and decided to use Go and not like Java, for example. I mean, Go is a way technically better language for that. But back then it was, we were early adopters. Like, we also found a lot of bugs in Go, or feature requests that we really needed, but it was a big bet to go into this new language that doesn't have an established community yet,

Starting point is 00:41:18 and I think it's not a clear cut what way to go, but this is, yeah, it speaks volumes that we get, get like new contributors that are super enthusiastic about coding react i mean i've only been enthusiastic but hopefully i mean luckily there are others who like it so do you know how that decision was made like what to choose or was it was it like the size of the community was it or did someone just say oh this looks cool and they started using react do you know i think it was largely driven by by julius julius wanted to learn react actually

Starting point is 00:41:51 and kind of tried it out here obviously asked everyone in our in one of our dev summits if people think this is a good idea to actually pursue fully. And we agreed on it. I mean, I think we never had like an explicit decision. Often things just happen, which can be good. Sometimes I think decisions should be explicit. But again, this is not easy to make a call. If this should be like super top-down, we all sit together in a committee and vote about it, or this should just happen.

Starting point is 00:42:24 Yeah, I mean. I think it or this should just happen yeah i mean i think it's best because just let it happen because somebody whoever's willing to do the work is is the one that should drive the change because we can we can make a committee decision after committee decision and then nobody will do anything with it and so doing the decision making by being willing to do the work and support it is much healthier for a project. That sounds like such an adult approach and such a sensible approach. It's almost like, of course it makes sense. Yeah, you're right.

Starting point is 00:42:54 Like whoever gets to do the work should, you know, decide whoever is most passionate about it. Well, they're going to be doing the work anyway. So why don't you just go ahead and, and you know because we trust you to make the right decision and as it turns out it was the right decision right the react community joined and there's like all this new interest that you wouldn't have had i mean i i don't think it's it's always that clear i think a project is sometimes very complex and some people need some guidance should they even become active in this area and i think we also had kind of incidents in the past where somebody just did something and it kind of steamrolled the others and then they feel he felt like frustrated or something

Starting point is 00:43:38 i think i think this is an actual hard problem i actually read a paper right now that some of my colleagues who was in bigger open source projects, recommended to me, how are open source communities making decisions? There's active research going on on that. Like, should you have a governance structure? I mean, we have a governance structure now. Like it's, I think it's an interesting,

Starting point is 00:44:00 but also very hard or it's a hard problem. That's why it's an interesting problem and important. That's the paper which I would like to read for sure and i can i know that many others will as well so i'll look forward to that link from jordan okay so one of the things which i'm aware of as a prometheus user is memory use is there anything that is being done about that in the next six months and improvements around improving Prometheus' use of memory? As a matter of fact, we had one of our developer summits just the inserts are happening, the live inserts of the data that's being scraped. And that builds a block of the most recent two hours of data. And then that's flushed to disk to an immutable block. And then we use memory mapping so the kernel takes care of that memory management there.

Starting point is 00:45:05 But that most recent two hours worth of data is kept in memory until we do this procedure. And so that can potentially make up a large amount of memory that you're using. And so we're looking into ways of offloading this from RAM REM basically to other mechanisms we haven't fully decided on what that is but we are actively looking into improvements that we can make there are various other mechanisms that that we want to look into even within the immutable blocks of data we want to explore as bjorn likes to say new old chunk encodings because when we this when we wrote the new time series engine we kind of made the decision that we'll for now only look at one type of chunk encoding

Starting point is 00:46:03 and we've realized that there's probably looking back in hindsight there's probably some potential for making better decisions potentially at runtime or at compaction time for example to optimize some of this data in a better way yeah like we had the the prometheus the storage engine for which this one was essentially hacked together. And when it was working well enough, we would do all the other stuff. And then the Prometheus 2 storage engine was really very carefully designed, but also like kind of reverted into just using essentially the classical Gorilla encoding that gets a Facebook paper.

Starting point is 00:46:41 And the Prometheus 1 storage had a few crazy hacks that we never really evaluated. But now we can compare. Cortex has this interesting... Cortex is one of those remote storage solutions. But they also use the exact same storage format. And they support everything, all the versions back into the pass. And they can directly compare how things look like. And apparently, if you just look at the encoding,

Starting point is 00:47:05 the Prometheus 1 encoding is like 30% better or something. So we see we can actually kind of, what's the word, like recover some of the archaeological evidence from that and perhaps improve. We can forward port some of the optimizations. Yeah, the Prometheus 2 format was very much designed to reduce the CPU needs for ingestion. And that completely succeeded

Starting point is 00:47:32 to the point where we actually have spare CPU. When you look at the CPU to memory ratios of a common server, the Prometheus server will use all of the memory, but only a quarter of the available CPU in the typical ratios you get on servers. So we could spend some more CPU to improve the compression and get us back some of that memory.

Starting point is 00:47:58 Because every time we improve our compression, it not only improves the disk storage space, it improves the memory storage because we keep the same data in memory as we do on disk. I'm sure that many users will be excited about this. I'm very excited to hear that. I'm looking forward to what will come out of this. As we are approaching the end of our interview, any other things worth mentioning or like one thing which is really worth mentioning i mean there would be no story about the future complete without my favorite kind of topic in

Starting point is 00:48:33 prometheus and that's histograms i'm probably known as mr histogram or something so like histograms and prometheus is like extremely powerful uh approach but it's kind of half-baked. We introduced them in 2015. And histograms is like a bucketed counter, like really like broadly spoken. And yeah, but there's... From an SRE perspective, histograms are extremely important

Starting point is 00:49:00 in getting more detail out of the latency in our applications. Several other monitoring platforms talk very loudly about histograms being important because we need detailed data on requests coming into the system, and an average is not good enough. And summaries, pre-computed quantiles, are also not good enough because they usually don't give us the granularity, and also they can't be compared across instances.

Starting point is 00:49:39 So if I've got a dozen pods, I need to have super detailed histogram data in order to do a proper analysis of my request, because it's okay to have 10 milliseconds of latency on a request, but it's not okay when 5% of those are so slow they're useless to the user. The typical is 10 milliseconds, but 5% of them are 10 seconds. I can't have that from my service SLA perspective. So I need more and more and more and more histograms, but right now they're just super expensive. And that's because Prometheus, in the same, like when we talked about the metadata,

Starting point is 00:50:32 where we said Prometheus throws everything away and everything is just like floating point numbers with timestamps, essentially. That's the same for histograms where the other part of the information is that this is all buckets belonging to the same histogram now every bucket that counter becomes its own time series in the in the Prometheus server so every bucket you add is like comes with the full cost of a new time series with no potential of whatever putting this together in some way or compressing this in some way. And there's decades of research how to represent distributions in an efficient way. And now that I have more time to work on Prometheus, my boss also likes this topic a lot. So perfect opportunity to really go into this.

Starting point is 00:51:19 I had a little talk at PromCon where I was giving my current state of research. And now at this conference like so many people and so many companies and organizations they are interested in that it was really exciting and the idea is to get something where those we could have way more buckets or we even have some kind of digest approach to that that plays well with the prometheus data model so it's a true challenge and it will be fairly invasive because it also changes how like prometheus the storage engine the evaluation model how it works because suddenly you have something that's not just a float it's like a representation of a distribution but the idea is that we will have very detailed and not very expensive histograms

Starting point is 00:52:03 in the not too far future. And yeah, I'm very hyped about this. That is so cool. That is so cool. So you mentioned something there which reminded me of a discussion which we had earlier. And that was around being more open and getting the community more involved in what is happening in Prometheus. So you or maybe Fred mentioned about the monthly community calls, the virtual calls. Who would like to cover that? Sure.

Starting point is 00:52:34 Yeah, we're trying to be more open with the wider developer community and our wider user base. And a lot of people have found that the Prometheus developer team is a little closed off and a little opaque. So we're now doing monthly public meetings and sharing what the developer team is up to and taking more input from the community in order to be a better open source project.

Starting point is 00:53:06 So how can users join those monthly meetings? Yes, on our website we have an announcement area for those community meetings. Yes, they're alternating so that they are compatible with Asian time zones and American time zones every other month. So that hopefully allows worldwide participation. Do we announce them on mailing lists or Twitter or something? We do announce them regularly on Twitter and the schedule is open. People can come and just ask their questions. We're super happy to answer them

Starting point is 00:53:46 to the best of our abilities thank you that's that's a great way of ending this in that there's no ending there's like other ways that people can join this and not just like because this is like one-sided people are listening to us but that's a way of them participating in Prometheus, getting to know more about Prometheus. So when is the next monthly meeting? Do you know? I think we just had one, so it'll be next month. Okay, so December. Yeah.

Starting point is 00:54:16 Right. 31st of December, I'm sure. I believe it's every first Wednesday of the month. And then the opposite time zone is the third Wednesday of every month. Whatever. I think it should be looked at on the record. It should provide a link in the show notes. We will do.

Starting point is 00:54:34 Thank you very much, Ben. Thank you very much, Fred. And thank you very much, Bjorn. It was a great pleasure having you. And I'm so excited about what you will do next. Thank you. Thank you. How often do you think about internal tooling? I'm talking about the back office apps, the tool the customer service team uses to access your databases, the S3 uploader you built last year

Starting point is 00:55:03 for the marketing team, that quick Firebase admin panel that lets you monitor key KPIs, and maybe even the tool that your data science team had together so they could provide custom ad spend insights. Literally every line of business relies upon internal tooling, but if I'm being honest, I don't know many engineers out there who enjoy building internal tools, let alone getting them excited about maintaining or even supporting them and this is where retool comes in companies like doordash brex plaid and even amazon they use retool to build internal tooling super fast the idea is that almost all internal tools look the same they're made of tables drop downs buttons text inputs and retool gives you a point click drag and drop interface that makes it super simple to build these types of interfaces

Starting point is 00:55:49 in hours, not days. Retool connects to any database or API. For example, to pull data from Postgres, just write a SQL query and drag and drop a table onto the canvas. And if you want to search across those fields fields add a search input bar and update your query save it share it it's too easy retool is built by engineers explicitly for engineers and for those concerned about data security retool can even be set up on premise in about 15 minutes using docker kubernetes or heroku learn more and try it free at retool.com slash changelog again retool.com slash changelog and Again, retool.com slash changelog. And by our friends at Square.

Starting point is 00:56:28 We're helping them to announce their new developer YouTube channel. Head to youtube.com slash square dev to learn more and subscribe. Here's a preview of their first episode of the Sandbox Show, where Shannon Skipper and Richard Moot deep dive into the concept of item potency. Welcome to the pilot episode of The Sandbox Show, a show where we'll... A YouTube show. ...where we'll deep dive into subjects that developers find interesting. Don't worry, there will be plenty of live coding. I'm Shannon and this is Richard, and we're going to cover a broad range of topics as the show evolves, but for today,

Starting point is 00:57:08 what are we going to be covering? On this first episode, we're going to be covering item potency. We had talked to people in our community and the thing that people seem to be really confused by is this concept of item potency and how does it relate to interacting with an API. And so I didn't do some Googling on this beforehand, but I know that you did. I did. So the definition of item potency comes from item and potent. So item being same and potent power or potency.

Starting point is 00:57:34 So it's the same potency. All right. Check out this full-length show and more on their YouTube channel at youtube.com slash square dev or search for Square Developer. Again, youtube.com slash square dev or search for square developer again youtube.com slash square dev is the 21st of November 2019. It's the last day of coupon North America. It's been a sunny day. It's been a great day so far. We had a great number of hosts and guests

Starting point is 00:58:05 on this show. No, there was only one, it was just me. We had a great number of guests on the show. Just earlier I was talking to Bjorn from Grafana, Fred from Red Hat and also Ben from GitLab and they were all on the Prometheus team, very passionate, a lot of interesting things that they've shared with us. And now we have Tom from Grafana, and we have Ed also from Grafana. And I'm also one of the Prometheus maintainers. Oh, thank you. I mean, I have seen your PRs here and there, but yes, another Prometheus maintainer.

Starting point is 00:58:39 So the reason why I was very excited to speak with you was, I know that you have a very passionate view on observability, on what it means for a system to be observable. And one of the key components in this new landscape with it, which is Kubernetes, all these stacks, the layers are getting deeper and deeper. So understanding what is happening in this very complex landscape, you need observability tooling, which is mature, which is complete. So tell me a bit about that. Yeah, I mean, thank you for having us. Observability is one of these buzzwords that's been going around a lot in the past few years. I think, you know, I've just been asked, I've been asked a lot the past few days what is observability how does grafana fit into the observability landscape i think you you know observability was previously kind of defined around these three pillars metrics

Starting point is 00:59:36 logs and traces um and then last year i think it was all this past year it was trendy kind of bash that uh as a as a analogy and some some of it was rightly so some of it may be less so um i still sometimes think about it like that um but i i try to avoid thinking about the particular data type the particular way you're storing it storing it the way you're collecting that data and i try and think more about how people are using that data. So for me, observability is about any kind of tooling, infrastructure, UIs, anything you build that helps you understand the behavior of your applications and its infrastructure. I think that's something really important to emphasize, because at the end of the day, it's about the stories that we tell right and then we use

Starting point is 01:00:25 data some form of data to tell a certain story and whatever data is relevant for that story use it doesn't matter what you call it as long as the focus is what are you trying to convey what are trying for someone to understand and what are you trying to what point are you trying to make right doesn't matter what you call it, as long as you don't forget what this is all about. So I'll give you an example that I think is really relevant, at least to Ed and I. We were in Munich two weeks ago for the Prometheus conference.

Starting point is 01:00:57 Great event, 200 or so people coming to just focus on Prometheus. And towards the end of the first day ed your pager went off right our hosted service um was having an issue and it turns out like it took us two hours to diagnose it we're using all of our tooling to and understand what went wrong um and i think at the end of the world we still don't actually know the root cause yet i mean once we figure it out we'll put it on the blog um but the the point of the story is more that a few days later, after we'd got back from PromCon, after we all sat together, after we had a video call with eight or nine of the team members on

Starting point is 01:01:35 and we were fishing through all of our metrics, all of our logs and all of our traces to try and figure out what really happened to try and get to that root cause. That was, for for me such a valuable experience dogfooding our own products dogfooding our own projects that we work on and using them to kind of try and understand what went wrong and try and build that picture and try and you know we've got graphs we've got log segments we've got everything we possibly gather together to try and understand why uh you know a node failure and an etcd master election and then a network partition

Starting point is 01:02:04 and everything seemed to go wrong at once, but really what was the root cause? And that was exciting. We also had David and members of the Grafana team join in to see a live example of how people were using the tools they're building and how they can improve the UX of those tools. And I think he ended up recording it

Starting point is 01:02:23 and showing it to more people in the team to go like, look know he wanted to click this but it was it wasn't quite in the right place so it wasn't quite the right thing that's a great story one thing which i really like about this story is um how relevant different elements of observability for lack of a better word um how important certain elements are so when you're trying to dig for root cause analysis, logs, they are very, very important, right? So metrics are getting a lot of attention, traces are getting a lot of attention, but I'm not seeing the same thing for logs. So other than Loki, which is an open source project, is there anything else out there that I'm

Starting point is 01:03:04 not aware of or for logs specifically that integrates with prometheus that integrates with zipkin or jaeger or what or whatever else you may have that will give you this root cause analysis tooling yeah i think the an interesting one here is when i joined profano labs 18 months ago they were already big users of zipkin but not in a traditional use case they weren't using it to visualize requests spanning multiple microservices they're actually using zipkin mostly for um like logs uh the request centric login because zipkin has these kind of basic logging features um i said zipkin then didn't i i meant jaeger didn't i yeah i meant

Starting point is 01:03:42 jaeger sorry big users of jaeger yeah it's fine we can edit that out um but yeah so they were big users but not for distributed tracing we came along and we wanted to use it for the visualization of the of the request flows through all the microservices but but yeah i was kind of i'd never really seen jaeger used primarily for something other than visualizing request flow so i guess you could think about the tracing tools as like a more more request oriented way of logging i mean obviously there are a lot of logging vendors out there you know a lot of them are represented at kubecon i think the most popular one for kubernetes has always been elastic um the elastic stack elk um that's what most people use and uh and it's a great it's a great tool like uh one of the things

Starting point is 01:04:25 that always impresses me about elastic is you can pretty much do anything with it like i've seen people you know build their whole bi and analytics stack on elastic i've seen people use it for developer centric logging people use it for audit logging people use it for security analysis people are using it for actually like searching web pages as well, which kind of is fun because that's what it was originally used for. Loki, I know you said apart from Loki, but Loki is not like Elastic in that sense. Like we are just focused on the developer-centric logging flow. You know, we just want to use basically what you would see in kubectl logs.

Starting point is 01:05:00 We want to give it a bit better user interface so you can kind of point and click and see it in Grafana. And honestly, I mean, we're a big, we've touched on dogfooding already. I think it's one of our superpowers at Grafana Labs. We build the product we want to use as developers. And really, the reason I started the Loki project was because you can't cube cut logs a pod that's gone away. And one of the common failure modes like pods would die disappear get rescheduled etc and i wanted to know what was going on in that pod before that

Starting point is 01:05:30 happened and that's why we built loki that's why we that's why we wanted basically cube cuttle logs but with you know with a bit more attention and so here's an interesting one cube cuttle cube cuttle cube ctl what do we say cube Cube Control? Cube Control, really? There are so many ways, yeah. There are so many ways, no. Cube CTL, from my perspective. Cube CTL, not Cube Cuddle. No. So wasn't there an unofficial logo which was a cuttlefish? Yes, there was.

Starting point is 01:05:53 There was an unofficial logo in a couple of places, yet the cuttlefish gets mentioned. I like the cuttlefish one. I mean, yeah, CTL, CIS CTL? Maybe that's where I have... I would say CISCUTL. CISCUTL. But did you used to say CISCUTL before?

Starting point is 01:06:08 No, I mean, maybe not. And this one really like it. It's definitely IOCUTL and not IOCUTL. Okay. Earlier, Ben was mentioning about all the different building blocks that exist in the observability landscape in the CNCF. And I can see Loki as one of those building blocks. The one thing which I really like about Grafana is that it doesn't make a,

Starting point is 01:06:34 it doesn't limit you what data sources you can use. So if you want to use Elk, you can do that. If you want to use Stackdriver, you can do that, which is logging from a vendor, perfectly fine, no problems. And if you want to use Prometheus, a very popular project, a graduated project, second graduated project in CNCF, you can use that as well. And it's a combination of all these tools and many others. InfluxDB. We've got over 60 different data sources.

Starting point is 01:07:02 There you go. I mean, I don't even know them all. Yeah, I mean, I couldn't name them them all you can combine them in innovative ways and you can almost do the right thing the right thing being relative and being relevant for you so what is the right thing for you and if you want to use loki so be it if you want to use splunk so be it yeah well so the thing i think is even more cool is it's not just about having these data sources and having to pull this data into dashboards and the explore mode. But what we're working on is, you know, with Loki, we built this experience where because we have this consistent metadata between the metrics and the logs, we allow you to switch between them automatically. So given any Prometheus graph, any Prometheus query, we can automatically show you relevant logs for it.

Starting point is 01:07:44 Now, that was a very Loki-specific feature. That was a very Loki-specific experience. We've been working really hard to try and bring that to other data sources. So we're now hopefully, as long as you curate your labels correctly, be able to achieve that kind of experience between Graphite and Elastic. This is something I didn't really understand until i joined grafana labs like the team is so committed to this big tent philosophy like of enabling these kind of workflows and enabling other systems and i really think the grafana project is the only

Starting point is 01:08:17 thing out there that really allows you to combine and mix and match and really is such so like so much so more additive to the ecosystem than other projects that are like, no, you can only use this data source. You can only talk to this database. A bridge. A bridge to all sorts of things. We're Switzerland, right? Yeah, right. I like that analogy very much. So we have Ed here. I hear that he's quite involved with Loki. And when you said we, Tom, I'm sure you meant the royal we because it's mostly Ed right let's be honest

Starting point is 01:08:48 Loki it's mostly Ed so tell us Ed about Loki why do you like it what do you like about it where is it going yeah I can still remember well probably about 10 months ago when I was interviewing with Tom and we were talking about Loki and it was new

Starting point is 01:09:04 to me at the time. And the first question I asked, isn't there already a solved problem? You know, don't we, we have solutions for logging already. And then, uh,

Starting point is 01:09:12 as he explained, I would almost call it a simplification of, of how Loki stores compared to other systems. Um, like all that immediately scratches an itch that I've had, right? Like I've been a developer, you know,

Starting point is 01:09:23 my whole life. And the two things that I do most with logs is I deploy software and I tail them. And I look for errors, right? And then I'm running the software and it's broken, right? And I got to go find where it's broken. So what Loki does really well is we only index the metadata, the label data that is part of your logs and not the full text of the log. So from an operating and overhead, it's much sort of leaner, I guess.

Starting point is 01:09:52 And as long as you're looking for data and you know the time span and you know that that relative like, you know, metadata, the server that was on the application, you're there. You're looking at your logs like and the tailing aspect is included as well with with Grafana um so i'm like wow you know that's that's what i wanted right like and and the big advantage from an operating uh perspective with loki now is that the you know the index um scales according to the size of your your metadata and not your log content right so um we're you know almost a couple orders of magnitude smaller on our index than we are on our stored log data. And then we can take advantage of object stores and compression to store data cheaply. So it's a really nice optimization on log content when you're a developer, an operator, and you really want to just want to get to my logs right now.

Starting point is 01:10:39 I want to look at this application logs. And last week, we're regularly right like let's go look at the you know what are the journal logs say for this node right like what is going on here you know can we add a regex filter on there for pcb out of memory like oh that's you know that's a lot of those right and and recently we've been adding support for for metric style queries against your logs right so this is the to me this was like the the grep you know minus b minus b minus b and then piping into word count you know i want minus b minus v and then piping into word count you know i want to know like how often is this happening but but it gets better because i

Starting point is 01:11:08 can see now in time how often it happens right and it's like pcp out of memory you know like that's it's probably wrong right that's probably a problem and um that's been really exciting you know and i feel like that's resonating with a lot of people we talk to here as well that are um you know this is this is what i want for my logs Like there's way more you can do with your logs than that, right? Absolutely. And some of these other projects are, you know, much better suited for, you know, the different kinds of queries you might do where you're you need a full index. But in a lot of cases, you know, the Loki model is really, really perfect for that. I really like that how you take a really simple simple idea. You start really simple as you possibly can. When you. Um, and I look at,

Starting point is 01:12:05 I look at elastic, but on Lucy and probably as a great building block. And I look at a lot of the projects that came out of that as being generally useful in a lot of places. But I, you know, I don't think big data ever quite hit. It's like it's promise.

Starting point is 01:12:20 Um, so one of the things I've always tried to do with, I think with everything I've done is be very, very focused on a particular story, a particular end user, a particular use case. You know, with with Loki, that use case was the, you know, the instance. I mean, I'm still on call at Grafana Labs. I don't know how Ed feels about that, but I think I still occasionally get paged at 3 a.m. And I really wanted tooling that would help me very quickly in a sleep deprived state, get to the root,

Starting point is 01:12:56 get to the, get to the problem as quickly as possible. And that's what the focus has always been on with Loki. And so you asked, where do we stop? Well, I think we don't try and make loki do tracing we don't try and make loki do bi we don't try and make loki do

Starting point is 01:13:12 you know use cases that are beyond that sleep deprived you know 3am instant response drill um i think we we we stay with these tightly focused stories and that's how we build great projects. That's mean. I learned that from Prometheus, Prometheus and still does is, is incredibly focused and incredibly, you know,

Starting point is 01:13:33 um, incredibly resistant to like feature, right. And scope creep. Um, and so I learned a lot through the Prometheus project and I'm really like, I really key to, to apply that to the,

Starting point is 01:13:44 to this project and maybe future projects. I caveat it with one thing um what we did with loki in the way we built loki so quickly is we actually took all of the distributed systems algorithms and data structures from another one of my projects from cortex um and so loki is really just like a thin well maybe not so thin anymore, but you know, it was started off as a thin veneer wrapped around the same distributed hash tables, the same inverted indexes and chunk stores that we used in cortex. And that's how we got the first project out so quickly. And so I'm all for code reuse.

Starting point is 01:14:20 I'm all for reusing data structures and sharing and this kind of stuff. But I just think the end solution that you build it into should be really, really focused. So Cortex is really cool. And I would like us to go into that soon. But before that, I would like to add an extra insight for those that maybe don't know you very well. You're the VP of product for Grafana Labs. So why are you being paged? Because you like it?

Starting point is 01:14:47 Because you want to be close to the tooling? Because you want to see what people will be getting? I think that's possibly the most committed VP of product that I've known. And that's the right way of approaching it so that you have a firsthand experience yourself. Yeah. Of all those products.

Starting point is 01:15:08 I think it's like, we talk at Grafana Labs about authenticity. Like we try and not spin the stories we're telling. We try and just tell real stories, authentic stories. And we try and talk about, you know, we, you know, I remember having a conversation with the CEO, with Raj about like, what does it mean to like build these empowered distributed teams of, of really awesome software engineers. And I think one of the ways we encapsulated it, like you see a lot on people's Twitter and bios, you see like, you know, thoughts or opinions here on my own. Like, so we have a very like opinions, like I never want any of my employees to have to caveat their opinions. Like we, I trust them all.

Starting point is 01:15:50 I want them to feel empowered to speak on behalf of, of the projects and the, and the company that they represent. And, uh, and yeah, I want them to speak authentically. Um, so part of that, if you hear me standing up talking and telling a story about like why I built Cortex, why we started Loki, why I use Prometheus, why I use Grafana, these are real stories from my actual experience. And I do miss not being able to write as much code as I used to. On the flight over to San Diego from London, I actually did a PR for Prometheus because like, you know, I'm a software engineer at the heart. I do miss it sometimes, but also i see the work that ed and the rest of the team are able to do um and like i just think you know as long as i can i can help as long as i can build a an environment for people to be that that successful then then i'm happy

Starting point is 01:16:38 i think i think that's a great um philosophy have. And it's really powerful. We can see how important it is to approach things like that, to really believe in that and to operate under that mindset. Yeah, I try to. So Cortex, very interesting. Another interesting Grafana Labs product, project, how would you call it? Well, so interesting Cortex isn't interesting Grafana Labs product, project, how would you call it? Interesting Cortex isn't a Grafana Labs project.

Starting point is 01:17:10 I started the Cortex project over three years ago before I worked for Grafana Labs. About a year ago we put it into the CNCF and so it's actually a CNCF sandbox project used by a lot of companies.

Starting point is 01:17:27 Every time I come to KubeCon I meet new companies who are like oh hey we use Cortex I'm like wow I had no idea you know we really just started it for our own needs to begin with we do Grafana Labs does use Cortex to power our hosted Prometheus product in Grafana Cloud

Starting point is 01:17:42 and so that's where our vested interest is right we are doing this because it's the basis of one of our big project products. Um, but also like one of the things, you know, I, I like cortex. I mean,

Starting point is 01:17:52 in a previous life I worked on Cassandra on a package. And so you'll see heavy influence in cortex in the algorithms and in the data structures from Cassandra. You know, we do a very similar virtual node scheme. We have very similar distribution and consistency and replication and these kinds of structures from Cassandra. We do a very similar virtual node scheme. We have very similar distribution and consistency and replication and these kind of things to Cassandra. I liked Cortex mainly because I was learning this new language.

Starting point is 01:18:15 It was called Go. And I thought this would be a great language to do lots of these kind of concurrent, highly distributed systems in. And so I kind of thought, well, what are the algorithms that I hope will be really easy to implement in Go that would be challenging to implement in other languages? So that was kind of one of my motivations for Cortex. I also at the time was building a different product.

Starting point is 01:18:37 It's still in the observability space, but it was still building a work on something called Scope. And I spent a long time building this. And one of the tools i used whilst building scope was prometheus and very quickly realized that prometheus was where it was at and was incredibly useful um and so yeah so that's kind of how i got into the prometheus space and then i thought well what the world really needs is like horizontally scalable clustered version of prometheus mostly because i just thought it'd be cool to build um and so we started it we built

Starting point is 01:19:04 it and we kind of learned what the actual use cases it applied to were we learned as we went and now I'd say like I originally thought long-term storage would be the biggest the biggest value of something like Cortex but now I think really it's the you know we talked about how the Prometheus community and the Prometheus team we we like to keep Prometheus well-defined and tight and small and easy to operate. And this excludes a lot of use cases. This particularly excludes a lot of use cases that involve monitoring over a global fleet of servers. And so really, I think the Cortex project,

Starting point is 01:19:39 its main value proposition is about monitoring lots of servers deployed in a global fleet. Maybe you've got tens of clusters on multiple different continents and you want to bring all of that, all of those metrics into a single place so you can do these queries. And then when we joined Grafana Labs and they had much larger customers than I'd ever worked with before, we started to experience query performance issues with Cortex. We hadn't really at the time had any very, very large users on it. And as we started to onboard very large users, they started to complain about the query performance. And so I guess the past 18 months of Cortex projects has been almost 100% focused on making it the fastest

Starting point is 01:20:20 possible Prometheus query evaluator out there um and that was the talk i gave at kubecon a couple of days ago uh it was about how we parallelize and cache and and partially and and emit like parallel partial sums for us to kind of re-aggregate you know and and we do all of these different techniques to really really accelerate our promql expressions and then the really the really interesting thing happened a few months ago because thanos you know we can't we can't not mention thanos thanos started off a year after cortex um started by bartek who also lives in london a good friend of mine and started to solve exactly the same problems that cortex was solved but effectively did it in the completely opposite way almost every step along the way they chose the opposite thanos has become a lot more popular than cortex for sure um and they did a really good job

Starting point is 01:21:11 of making it a really easy to adopt system great documentation and really a really like they really invested in the community um so i learned a lot you know thanos is more popular than cortex but i think one of the things we've been able to do recently is take a lot of stuff we've built and deployed in Cortex to accelerate query performance and apply it to Thanos. And that's kind of exciting because now we can bring these really cool techniques to a much larger community. I know this was asked before, but the one thing which I kept thinking during your talk is when will you announce that Thanos and Cortex will merge and we will become one? And I think you made a great joke about it like they have right they will merge I know that is not happening or at least not right

Starting point is 01:21:51 now not that we know of but the inspiration was from Flux and Argo our two very popular projects in the CID space have merged I think that's a great combination of effort, getting the best of both worlds. I'm sure many are wondering, will that ever happen? It would be cool, but I'm sure it also has its own challenges for that to be the case, for Thanos and Cortex to merge. So we'll watch this space for sure. I don't want to see merging as an an end goal. Like I think the end goal should be collaboration. Like in the same way, you know, one of the things I like about the Prometheus community is they've been so open to adding maintainers because of their contributions effectively to other

Starting point is 01:22:35 projects. So the main reason I'm a Prometheus maintainer is because I started Cortec. And similarly, like Bartek has been added to the prometheus maintainer team recently as well so there's a huge overlap between the thanos maintainers the prometheus maintainers and the cortex maintainers and really i don't think the end goal should be should be um convergence of these two projects i think there should be an increased collaboration between them and that's what we're what we're working towards i really like working with the

Starting point is 01:23:02 thanos guys i really like working with the prometheus guys and finding ways in which we can share and collaborate more share cool examples try different things in different projects um that sounds awesome to me like the deployment models for thanos and cortex are completely different opposite ends of the spectrum and so maybe they'll never merge right maybe they'll never because the deployments are so different maybe they'll stay separate um but? Maybe they'll never because the deployments are so different. Maybe they'll stay separate. But I think the technologies and the libraries they share, I mean, both Thanos and Cortex use the same PromQL query engine that Prometheus uses.

Starting point is 01:23:33 I mean, it is the Prometheus query engine. Both Cortex and Thanos use the same compression format for their time series data. You know, we share way more stuff in common than our differences, really. And I just, you know, I look at some of the mergers of communities over the past year, and I think they've been announced before, really, like the communities have had a chance to gel and really demonstrate the benefits of that merger.

Starting point is 01:23:55 And so, like, I definitely kind of, I want to demonstrate the benefits of working together first. And if it turns out, you know, we are already working together and we are having some great success. And if that continues, and if, like, we like we find you know even more ways to work together then maybe a merger makes sense but but i'm more interested in the the shared code the collaboration the shared solutions that's a great take i really like that makes a lot of sense as if you have thought about this long and hard i would say so you strike me like the person that always has a couple of projects, side projects in his back pocket. Anything that you'd like to share with us?

Starting point is 01:24:30 Anything interesting that you're working on, hacking on, or maybe Ed? What do you reckon, like Tanker? Tanker's pretty cool. We should mention Tanker here. So this is not really my project. There's a very young chap called Tom brack in uh in germany who approached us actually at cubecon um and well he was 17 at the time he came up to our booth spoke to gotham and i and said i really like what you're doing with jsonic i really like the whole mixins thing

Starting point is 01:24:57 i really like cortex i really like loki like do you have a summer internship position and i'm like a 17 year old kid is talking to me about Jsonic. Jsonic is one of the nichest aspects of this community, like I'm aware of, right? And so we got chatting to him, and he did end up doing a summer internship. And about the same time, Petio was sold to VMware,

Starting point is 01:25:18 and VMware discontinued the Ksonic project. We were big users. I really liked what they were doing with Ksonic. I really liked how it enabled this kind of reusable and composable configuration as code. And when I joined Profile Labs, we rolled out Ksonic everywhere.

Starting point is 01:25:36 And so to hear it was discontinued was like a bit of a problem for us. We continued to use it. We continued to invest in it. And when Tom Brack came along, we actually re-implemented it into this project called Tank with a whole bunch of other really cool improvements that he's done. It's now much faster. It just forks out to kubectl. So we don't have a lot of compatibility challenges. It's got a much more sophisticated diffing mechanism.

Starting point is 01:25:59 And this 17-year-old kid has just massively improved the productivity of the engineers in Grafana Labs by really improving the tool chain for our Kubernetes config management. So if anyone here is using JSON, using Ksonnet and wondering what the future holds, I'd encourage you to check out Tank. It's a really, really cool project. This is something which keeps coming over and over again the community the openness the the barrier of entry which is so low and how everybody's there to help you right whatever age you have whatever inclination have whatever you want to do you can do and everybody's there to guide you help you and accept whichever

Starting point is 01:26:41 contribution you want to bring this is something so valuable which over the last three days i keep seeing over and over again um let's say like it's one of the core values of this new community and this new ecosystem which has grown so much by 12 000 people did you manage to speak to all of them i mean probably about a 12th of them right yeah right it definitely feels that way. I think I would definitely agree the superpower for the Kubernetes and for the cloud native community as a whole is

Starting point is 01:27:11 this openness, is this acceptance. I really like what the CNCF has done by having multiple competing projects in their incubation, like Thanos and Cortex are both in there. And I really look forward to other projects coming in and doing the same thing.

Starting point is 01:27:28 I think I really like how the CNCF are not kingmakers in this respect. I think that openness is great. And then the whole, no matter what you think about Kubernetes and its complexity and its adoption, I think the real benefit of Kubernetes is the openness. And if you really want to and have the time and the effort

Starting point is 01:27:50 to make a contribution and make a change, definitely it will be accepted and you'll be embraced open arms. And eventually you'll be put in charge of some huge component and you're like, what? Yeah, I'm a big fan. And especially if you're a VP of product, right? PR to Prometheus. Yeah, I don't, I mean, I think I've had some PRs into Kubernetes. I'm not sure.

Starting point is 01:28:11 But I don't get to do as much code as I used to. I mean, I do miss it. I think, you know, you still get to play. I still do a fair amount of config management work because I still help with the deployments and still building dashboards and occasionally doing PRs to Prometheus and still doing a fair amount of code review. Not as much as I used to, but I've spent a lot of my time doing all sorts of things now. Doing marketing work, that's an interesting one. So as you're approaching the end of this interview, and also we're approaching the end of KubeCon, which is an amazing, amazing event. Um, anything specific that, um, you were impressed by, or you wouldn't expect to see, and you were very happy to see, um, any key takeaways?

Starting point is 01:28:56 Uh, my, uh, my story is, is we were talking a little bit, this is my first QCon, um, and I'm new to the open source community. I've worked a lot of enterprise jobs prior to this and it's it is really exciting I have to say that the people that come up to the booth and talk about like hey we use perform hey we love it you know like being part of that you know being part of a project that I met someone that has a contributor to Loki that came up they were really really excited. It's a really cool feeling to have people see these tools and actually use them, come

Starting point is 01:29:29 talk to you about it. I really enjoy the amount of people interested, the talks that we're giving that are deep dives into these projects that people are interested in seeing. It's such a different experience than the software I've done in the past. I think it's really neat as a developer even if you're just using

Starting point is 01:29:46 these tools because it's a because of the tools and their proliferation and their openness it's a skill set you can take anywhere with you right like these are real skills and there's I think companies are starting to see the real value in having tool chains that people know by name right

Starting point is 01:30:02 you hear Prometheus more and more and more that's, that's really valuable. And to have that be open source technology is really amazing. Thank you, Ed. Thank you, Tom. It's been a pleasure having you. I look forward to the next one. Cheers. This episode is brought to you by Git Prime. Git Prime helps software teams accelerate their velocity and release products faster by turning historical Git data into easy-to-understand insights and reports. Because past performance predicts future performance, Git Prime can examine your Git data to identify bottlenecks, compare sprints and releases over time,

Starting point is 01:30:48 and enable data-driven discussions about engineering and product development. Shift faster because you know more, not because you're rushing. Get started at gitprime.com slash changelog. That's G-I-T-P-R-I-M-E dot com slash changelog. Again, gitprime.com slash changelog. I would like to say that we've kept the best for last, but that's something for you to appreciate. We are definitely ending the KubeCon on a high. Most people are already breaking off and some have already flown back home. We're still here so in this way we are officially ending KubeCon

Starting point is 01:31:39 with this last interview. I have around me three gentlemen left to right. We have Jared, we have Marcus, and we have Dan, all from Upbound. You may recognize them by Crossplane, that's a very strong name, and also Rook. So they are the ones, some of them that are behind these great projects. I'll let them maybe speak a little bit about their involvement and also tell us what they're passionate about, what their takeaways are from the conference. So who would like to start? I'd be happy to start.

Starting point is 01:32:16 So this is Jared. And I have been a founder and a maintainer on both the Rook project and the Crossplane project. So I've been sort of living in the open source cloud native ecosystem for multiple years now. And one of the biggest things for me that I see consistently is that each KubeCon gets that much more crazy, that much more lively. And the amount of new people that are coming into the ecosystem is always a fairly surprising amount. I think anytime that you go to a talk and people ask, is this your first KubeCon? You see a large majority of the room raising their hands.

Starting point is 01:32:54 And to me, that says that this ecosystem is onto something exciting and it's attracting more people and it's gaining more adoption. And that's something that consistently excites me a lot. I see it all the time at every KubeCon. Yeah, Dan was calling those the second graders, right? There were a lot of second graders at this KubeCon, and some fourth graders. It was, I really enjoyed that. It was a great analogy.

Starting point is 01:33:17 The analogy where he was showing how his son was playing Minecraft and hiding the screen because that was the way to survive the night and uh yes everyone at at the convention was represented if it was their first year they were uh considered second graders and everyone else was only fourth graders because the project itself is only five years old and so we're all new and learning this together. Yeah, it's a great analogy. Yeah, definitely. I think personally, that was a really cool analogy for me because I actually graduated from college recently and I'm fairly young in the community.

Starting point is 01:33:56 But a lot of people have been extremely welcoming and kind to me, welcoming me into not just the Crossplane and rook uh ecosystems but also in the greater kubernetes ecosystem um welcoming onto the actual release team for 1.17 and being part of that was super cool and there's just a lot of people have been around you know from the inception of kubernetes who are saying you know you're a young person come in here and you're welcome and we value your thoughts and opinions and your efforts. So it's definitely a cool place to be at KubeCon and being surrounded by really talented people like that. And actually, I think that's something that speaks a lot to not only the community and the ecosystem here amongst people that are part of this cloud native movement,

Starting point is 01:34:45 but I think that's just open source in general. I've seen a massive change over the past, you know, five years, 10 years, and, you know, even earlier than that, where you've got these communities that are able to form based on, you know, these more socialized sites like GitHub and GitLab, where you're able to, you know, get these communities built and be able to be very collaborative in a very open environment that not only is getting these projects more out there and in the hands of other people, but it's attracting people that bring a lot of enthusiasm that feel welcomed because of the way that the community is treating people, but getting more people involved in open source that has, you know, ever been involved before. It's not something just for, you know, graybeards anymore. It's open sources for everybody now and it's pretty awesome so this is something that was mentioned a couple of times even i mentioned

Starting point is 01:35:29 it a couple of times in in these interviews um i'm still surprised by how open and welcoming everybody is even though it's been three packed days, even today, everybody was still happy, was still smiling and really happy to answer any questions. And even though they were really tired, you could see some people had three very hard days and who knows how many months before that. So Brian was just saying a lot of the preparation started six months ago. So some have been at this for a really long time. And yet, open, welcoming, warm. It was great. My first KubeCon, I loved it.

Starting point is 01:36:13 What was your first KubeCon? This was my first KubeCon. So you were experiencing that welcoming attitude firsthand. Yes. I love that. That was amazing. Natasha and Priyanka, they were talking about the process and especially natasha since she has been in the cncf for a couple of years before git lab she's saying

Starting point is 01:36:31 about the processes which they have in place all the documentation how that is such an important factor in this welcoming community i think that's really been recognized as a key thing in the success of Kubernetes and the open source ecosystem in general. I think that's one of the drivers for it. It's not only the right thing to do to welcome people in and make everyone feel a part of the community. It's also in the best interest of the project. And I'm sure Jared will probably talk about this shortly, but I think that's been reflected in some of the work we're doing as well, where, you know, we're reliant on a strong community to be successful in what we're trying to go after. So, yeah, it's cool to see that it's not only the right thing to do to treat people well, but it's also beneficial for, you know, achieving whatever goal you're searching for. And speaking about the goals, I think that's another thing that makes the open source projects work and has people coming to the booth,

Starting point is 01:37:30 being happy to talk about the project. Maybe they don't understand it at first, but as you start talking to them, they realize and you realize that they have the same concerns and they need the same sort of outcomes that you do. And when there's a fit between your tool and what their needs are.

Starting point is 01:37:46 And the ecosystem of open source is many solutions to the same problem. And each one kind of tackles it a different way. But it's great when you start explaining what your product does and they latch onto that and they kind of they lead the conversation because they know how to make what you've offered so far more useful to fit their circumstances. And yeah, it's good to have those conversations. I think it keeps that positive attitude. If everybody walked up and like, what is your product? I don't get it. It'd be a little souring. And along with that welcoming nature there,

Starting point is 01:38:25 this is a story I really like to share with people because it highlights how things can go in the completely opposite direction and cause a very toxic environment. And so I will certainly not mention the project that this happened on. But and it's not in the cloud native ecosystem at all. It's certainly not a CNCF project because all those communities are super welcoming and kind. But there was an open source project I got really excited about because it was very aligned with some of my personal interests. And being a maintainer on other open source projects, I know how important it is to have a contributor's guide to be able to welcome new people into the community, but also have pragmatic or practical steps of this is how you build the project. This

Starting point is 01:39:05 is how you add unit tests. This is the criteria for opening a pull request and getting it accepted. And so I opened an issue on a particular GitHub open source project. And within five minutes or so, one of the maintainers on that project replied back to me for my request to create a contributor guide so that I could start helping them out. He told me that it was the dumbest issue he's ever seen. He used some explicit language and said that he's tired of idiots opening issues in his repo. And I cannot imagine that they ever got another contributor to join that project ever again because of that completely toxic behavior. And so there's a spectrum of being welcoming, kind, supportive. And then there's that type of behavior, which I don't think anyone else has ever had an experience like that. It's definitely an anomaly, an outlier, but it is the worst way to run a community ever.

Starting point is 01:39:53 Wow. Wow. Okay. Well, I'm really glad that that's like, you know, like a really bad example. And because it's really easy to forget right but these things do happen even today we don't realize because we're so privileged to be in such a great community and to have so many nice people genuinely nice people around us and we do forget that things like these do happen so what i would, everybody that has such an experience and more than welcome to join the CNCF community, right? Because we will show them that that is not normal.

Starting point is 01:40:34 We'll show them what normal is. We'll be more than happy to get as many people as want on board because this is normal and this is good. Yes. And I think that speaks to the success of this approach. I'm not sure how many people were at the last KubeCon, but this one was 12,000 people. And I know the first ones, like only four or five years ago,

Starting point is 01:40:58 were like 500, 1,000. So how much this community has grown, and maybe this has something to do with it, I think. And the success of one project can lead to the success of the other projects. Once you've modeled how to develop a great community and nurture the community with this sort of support to continue contributing, all the other projects are going to be able to benefit from that so that's i'm really glad you mentioned that marcus because i would like us to maybe start looking a little bit at crossplane and the one thing which at least that's what crossplane is to me and you know you can give me your perspectives is how it's the

Starting point is 01:41:41 embodiment of leveling the playing field, being open, bridges everywhere, right? Everybody's welcome to the party. No vendor lock-in. It's just the opposite of that, right? We're open. We embrace everybody. We are open to anybody working with us.

Starting point is 01:41:59 And this is what we think the future looks like. So it's this, all the bridges between all the vendors, all the ISAs, all the services? That's how I see it. But how do you see it, Dan? Yeah, so that's exactly right. And, you know, we pitch the project as the open multi-cloud control plane. And that's really what it is. We're really trying to open up all of the different cloud provider managed services

Starting point is 01:42:25 to anyone and everyone and really reduce that barrier of switching between them. And, you know, it's built in such a way that allows people to add their own extension points to that. So there's really no one who's not welcome there, right? You could start a cloud provider in your in your home lab, in your apartment. And you could add a stack for that with crossplane, which I'm sure we'll get to later, but and extend that to include that. And what that does is it really allows people to pick the best solution for their problem. So, you know, there's, there's a variety of scales of cloud providers, and maybe you just provide a managed database service, and it has a very specific use case.

Starting point is 01:43:13 And in an enterprise setting, that can be really hard to adopt because it takes a lot of effort and time to bring on new providers and integrate with them. But if you integrate in a consistent way, then you the companies and the groups of people who are providing open source projects that, you know, fit certain maybe niche needs. Those are now a lot easier to use and you can pick the best thing to fit whatever use case you have. Yeah, I think that's when you're trying to level the playing field or provide easy attainable access to open source software or to, you know, proprietary software, whatever it may be. But getting access in a consistent way across a lot of different options to a lot of different people and needs and scenarios, you know, that that's really part of opening opening the door there for everybody. And so I think that our efforts here are being based on this foundation that Kubernetes itself has started. Because if you take a step back and you look at the underlying cloud provider or hardware or whatever it may be. It abstracts away the infrastructure in the data center and allows your applications to run in a very agnostic way. So Kubernetes kind of started pioneering this trail here where your application doesn't have to worry about the environment it's running in. You know, it can basically just express itself

Starting point is 01:44:45 in a simple way and then run anywhere. That's a start. But then there's many ways to take that further. We've heard Dan mentioned something about stacks. I'm looking at Marcus because I know that he's been closely involved with various stacks. Can you tell us Marcus Marcus, what stacks are and what stacks are currently available in Crossplane? Sure. Stacks are a package of resources that Crossplane uses to extend the Kubernetes API with knowledge of cloud provider resources or any sort of infrastructure resource. Additionally, applications, but first focusing on the infrastructure resources. There are stacks currently for Google, Azure, and AWS, and additional ones, Packet and Rook, all interesting topics.

Starting point is 01:45:49 So taking the example of Google, there's a Cloud SQL MySQL instance. And one can imagine in Kubernetes, creating an instance of that resource, specifying in the spec of that resource, all of the API parameters that you need to configure that resource in the cloud. And then within Kubernetes, using Kubernetes lifecycle management, you've created this resource that will be reconciled, creating a cloud provider resource. And the byproduct of that is a secret that you can bind to your application so that whatever application it is you need that needs MySQL that has access to your MySQL.

Starting point is 01:46:20 The way that we've done this in Crossplane is we've abstracted that fact to five, currently like five different abstractions. Maybe there's six, I'm losing count. Different abstractions. So we've got one for MySQL, Redis, Postgres, object storage, Kubernetes engines themselves. And if you're familiar with the concept of the CSI drivers where there's persistent volume claims and their storage classes, in that setting, you have a deployment who has the intent, a deployment with pods that have the intent to be bound to storage,

Starting point is 01:47:01 box storage, whatever. And they make a request for, say, 20 gigs of storage attached. They don't know, they don't care how that storage is attached to storage, box storage, whatever. And they make a request for, say, 20 gigs of storage attached. They don't know, they don't care how that storage is attached to them, the pods. And somewhere else has been configured a storage class. And this storage class dictates that storage will be provided through EBS or through any other form of storage that the cloud provider is capable of providing. All the other settings, whether it's faster service or cheaper service, is defined in that storage class.

Starting point is 01:47:33 And what Crossplane's done is take that concept and extend it to all of the other resources that you could want to use in your cluster or for your applications. So MySQL and Postgres and so forth. So MySQL, Postgres, and you mentioned Rook as well. These are still relatively low-level building blocks. Do you have higher-level building blocks for someone that, for example, wants a type of an application

Starting point is 01:48:08 so that there's a bit more that's done for you out of the box? You don't have these blocks to assemble yourself. Yeah, so one of the things that we're really focused on as a project is addressing it in layers, right? So starting with the lowest level, and then building on top of that, and also allowing other people in the

Starting point is 01:48:29 community to build on top of it. And one of the great values of being standardized on the Kubernetes API is that we can integrate with a lot of different things. So as Marcus was talking about, we have a lot of infrastructure resources that we that we talked about. And you know, in some ways, those are abstracted, because they're managed services, which are a little simpler than running your own, you know, MySQL instance on bare metal or something like that. But you can continue to build on top of that and package those together. And Marcus alluded a little bit to a different kind of stack that we support as well, which are application stacks. So a common example that we talk about, just because everyone's usually familiar with it, is a WordPress instance.

Starting point is 01:49:09 So a WordPress blog, everyone's pretty much familiar with that. And usually what it takes to do that is somewhere to run it. So maybe a Kubernetes cluster, and then some sort of deployments into that cluster, which have the container running in a pod or something like that. And then some sort of database, MySQL for WordPress, that you need to provision as well for that to talk to and store posts and comments and that sort of thing. And so what you can do with crossplane is bundle that up into another sort of custom resource, which is a Kubernetes concept, which basically allows you to extend their control plan. So all of these infrastructure resources we've talked about are deployed through custom resource

Starting point is 01:49:49 definitions, and then instances of those are the custom resources. So you could extend that to have a WordPress custom resource definition that says, you know, I need these maybe lower level concepts, as you were alluding to, to be able to run this application. And, you know, someone can just deploy this WordPress instance resource, and it will take care of deploying all those resources in an agnostic manner as well, meaning that it can be deployed on GCP or AWS or Azure or any other cloud provider, even your on-prem solution, if so be it. And so that allows someone who's at a higher level. We like to think about a separation of concern and cross-plane between

Starting point is 01:50:30 someone who would be on a platform or operations team who defines available infrastructure, and then someone on an applications team, where if you get something like a WordPress instance, maybe on a marketing team or something higher than that, being able to deploy things in a consistent manner that is something that their organization has deemed appropriate for their use case. So I really like this concept. And one thing, again, on the top of my head, which I would really like to know if it exists, is you have crossplane running in a Kubernetes cluster. Can that Crossplane instance stamp out other Kubernetes clusters, which maybe have a couple of building blocks already pre-installed? They're all the same. Does this functionality exist?

Starting point is 01:51:17 Yeah, so if you look at... When you take a philosophy of treating everything as a resource in Kubernetes, then that allows you to do some interesting things where Kubernetes itself can be treated as just another type of resource. So, you know, maybe you need a Postgres, maybe you need a Redis cache, but maybe you also need a Kubernetes cluster. And so being able to dynamically provision, you know, on the fly, bring up a Kubernetes cluster with a certain configuration or certain applications or, you know, certain networking plugins, whatever it may need or policies, whatever it may be, to be able to, you know, on demand, bring those up and get them as part of your environment is a consistent experience like with any other type of resource. So I've heard people many times kind of express how Kubernetes is a platform for platforms. And I think that we're really starting to see that, that a lot of the base problems have been solved in Kubernetes of,

Starting point is 01:52:12 you know, a declarative API for configuration, active reconciliation controllers that are, you know, level triggered, not edge triggered. There's all these different philosophies that went into Kubernetes that have made this platform where we can start building higher level concepts on top of it. And then the higher you go up the stack, the more opinionated you can become. So you become more specific to certain use cases. But when you have these building blocks, and you've got community effort around, you know, bringing them into something that's more useful and higher up the stack with more functionality or easier to use, you know, then you can end up with cases where I can just bring up Kubernetes itself and

Starting point is 01:52:51 start using that and treat that as maybe clusters as cattle, you know, everything, a lot of things are training towards cattle. That's right. Another, another trend there. And somebody used one this week too, that it was something that as cattle that I had, I had never heard before. And I want to, I want to remember that and bring that back. Cause I think it was taking it a little too far that, you know, it was like, okay, not everything has to be cattle,

Starting point is 01:53:10 but maybe I'm just not on board with it yet. So new things from cube gone this week that I still need to process. Well, I did hear that, uh, cube controller, however your preferred way of saying that word, CTL. Yes. I did hear that pronounces cube cattle this week uh which is saying that to a whole another level so like cubed cattle and just yeah that's a good one one thing which i don't know enough about and i'd like to know more about this rook where that's rook bitten all of this yes and i'd be happy to take that one since uh you been working on Rook for just over three years now.

Starting point is 01:53:49 So I believe that where Rook really shines is its focus being on an orchestrator for storage. If you think about the roots of the Rook project when we started it more than three years ago. Something that we saw as Kubernetes was still in very early days is that you would ask people that are using Kubernetes, you'd say, oh, okay, so what are you doing for persistent storage? And almost nobody had a good answer to that. That was a very, very commonly unanswered question because they're just running stateless workloads in Kubernetes. And so we started seeing value of, okay, if we can use these primitives and these patterns that are in Kubernetes and these best practices that are starting to form around how do you manage an application's lifecycle? How do you maintain reliability of a distributed system? All these things, these problems were being solved and then being able to build on top of that with, okay, let's do the same thing for storage. Let's reuse the Kubernetes best practices and patterns to stop relying on external storage or storage that's outside of

Starting point is 01:54:51 the cluster. Maybe it's in a NAS device or a SAN or maybe like a cloud providers block storage service or whatever it may be. but being able to bring those into the cluster and orchestrate them to be able to take advantage of the resources that are already in the cluster, available hard drives or different classes of service, a regular spinning platter disk or SSD or NVMe, whatever it may be, but being able to provide storage to applications

Starting point is 01:55:21 in a cloud-native type of way, going to the full stack there. And so that's something that we found that got a lot of traction pretty quickly. And then, you know, it wasn't too long. It was only a few early minor releases before we started getting production usage of it, which was always very surprising because it was an alpha-level project and we were very clear about this isn't intended to be used in production yet. But we got production, you know production adoption pretty early on right away,

Starting point is 01:55:46 which helped drive the maturity of the project as well. Wow. Okay. Three years. That's a long time, right? In the Kubernetes world and Kubernetes itself has been like five years, roughly. So three years, that's a really long time. Enough to mature to get to a point where it solves a lot of real world problems. That's great to hear um i'm wondering this is more of a like personal interest does it support lvm does rook support lvm yes and so that's uh an interesting question because if you look at the design of the rook project it's it's basically separated to two distinct layers one of the layers

Starting point is 01:56:24 which is the core functionality of Rook is this orchestration layer, this management layer that, you know, will do the steps necessary to bring up the data layer that's underneath it to get it running and do day two operations to make sure it's healthy. And so storage providers that Rook performs storage orchestration for within your Kubernetes cluster, it's up to that data path there to know how to handle LVM or any other type of storage fabrics and storage presentations that you can find in a cluster. So there are a number of storage providers inside of Rook that do work with LVM.

Starting point is 01:56:59 Okay, that's great. I really have to check that out. Very, very interesting. Okay, so just to go back to Marcus again, because it's something which is at the back of my mind, is you mentioned support in cross-plane for AWS, GCP, and Azure, or Azure, as you pronounce it. What about the other providers? There's like so many more other providers, and Dan mentioned this, right? Like any provider can be part of Crossplane. What does the path for other providers look like that would like to be part of Crossplane? Sure.

Starting point is 01:57:33 Well, we've stamped out the pattern by creating those stacks. And in the process of creating those stacks, they were created initially, all of them within the Crossplane project itself. And it was interesting, even though it's all inside of one repository, the different providers were implemented by different developers at different times, adopting different best practices, what they thought was the best practice at the time, and eventually coalesced into one set of design patterns, which had been sort of the best of breed. And around the same time, we decided to extract these, what we call stacks, extract those providers, those stacks out of the Crossplane project

Starting point is 01:58:18 into their own stack repositories. So github.com slash crossplaneio slash stack-gcp-azure. And I don't know if I'm pronouncing it correctly. And stack-aws. And we have additional ones, rook and packet. And there's really an easy way to get that started

Starting point is 01:58:39 for any other cloud provider interested in being able to provide their managed services through crossplane and having that abstracted away, if you have a managed MySQL or a managed Postgres, then users can create a claim for a MySQL instance. And one day, they're getting RDS, the next day, they're getting GCP, the next day, they're getting your service, maybe in one namespace, it's resolving to GCP and then day they're getting your service. Maybe in one namespace it's resolving to GCP and then for

Starting point is 01:59:07 some production workload and in another namespace it's reconciling to whoever's cloud providers manage MySQL. And again, not just for MySQL, Postgres, all the different types. And Packet is a great example because

Starting point is 01:59:23 before Packet we didn't have the abstraction for machines. But packet provides their devices where they what device is the name? Yeah, it's essentially a, you know, bare metal offering that they provide via their cloud provider offering. And, you know, they, they came and wanted to have a stack and we didn't have support for what we call a claim for machine instances. So we wouldn't be able to dynamically provision those. So as part of the core Crossplane project, we now had a stack that wanted to be able to dynamically provision and integrate with Crossplane. So we were happy to work with them to add the machine instance claim type that now allows an abstraction that can be used by

Starting point is 02:00:05 other providers as well, because obviously AWS and GCP, etc, have, you know, VMs like EC2, and that sort of thing, they can also utilize that. So it's just another opportunity for portability. Another thing to kind of build on what Marcus was saying is, besides just having those best practices reflected in those stacks in our organization, we also have abstracted out to a library cross-plane runtime, which is kind of based on the controller runtime project, which I'm sure a lot of listeners who have built controllers are familiar with. So that's part of the Kubernetes organization. Essentially, what that does is it gives you a interface for building controllers and running those in a Kubernetes cluster and some best practices for doing that. Well, most of our stacks are using that, but also doing other things, namely interacting with external APIs.

Starting point is 02:00:58 So there's certain patterns that are very common across stacks that do that. So we've been able to abstract those out into a library and just say, you know, you just need to tell us for this resource how you want to observe the resource, create the resource, update the resource, and delete it, and then provide us methods to do that. And then the logic that's around that and actually executing those things can happen in the runtime library.

Starting point is 02:01:25 So it really lowers the barrier to entry for people implementing new stacks, which I think is really valuable as we see more and more community adoption. I think just today, we actually saw a cloud provider in Europe announced that they were using Crossplane and had built a stack for that. And we had very little input on that. We did a little bit of code review, but they were able to take that library and some of the documentation we've written and build their own stack, largely isolated from any of the work that the cross-plane community was doing.

Starting point is 02:01:54 And that was some really strong validation for us. And I think that we'll start to see that happening a lot more in the next weeks and months. And it also gets back to the idea of Kubernetes being a platform for platforms. Kubernetes and its architecture has enabled Crossplane to now become a platform for all these other different cloud providers or independent software vendors or whoever to build their application and get more reach and scope of accessing more customer markets or more

Starting point is 02:02:24 segments or whatever for people to come and start using their software in this open cloud sort of way with portability and all these different features that enable more people to access more software. Yeah. So we've heard a lot about the AWS and GCP and Azure, which would make people think

Starting point is 02:02:45 that it's mostly about infrastructure or infrastructure or like a service. But service, again, which is still tied to the infrastructure. But I know that recently you have started, maybe even finished, integration with GitLab. So you can get the GitLab resource, which is a completely different type of resource that's cross-pl Crossplane enables. Can

Starting point is 02:03:05 someone tell me more about that? I'd be happy to talk about that. That's something definitely that I've spent a lot of time on recently. And so if you, you know, we started alluding earlier, Dan was talking about how you can create a Crossplane stack that helps you deploy your application such as WordPress.

Starting point is 02:03:22 And, you know, WordPress was a good place to start because it's a fairly simple application. It's just a container and MySQL and then maybe a cluster to run that container on. But then in the CubeCon Barcelona timeframe, we put a significant effort into being able to deploy GitLab itself. And so if you look at the architectural components in GitLab,

Starting point is 02:03:43 they have a Helm chart. And currently that's their main supported way that they had started with to deploy GitLab and everything that comes with it into Kubernetes. And once you render that out, you know, it's on the order of like 50 different containers, like, you know, 20 config maps, let's say, all these different resources that, you know, speaks to a fairly complicated application set, right? And, but if you boil it down, what it really needs is a set of containers to run their microservices, and then Postgres, Redis, object storage, and that's basically it. So, you know, we being able to model that and then express in a very portable way that my application needs these containers and these databases, et cetera, and being able to deploy that to any cloud is a huge step forward in being able to easily manage applications,

Starting point is 02:04:32 not just infrastructure, but higher-level applications, such as GitLab, into new environments that maybe they haven't been able to run in so far. Yeah. Hearing you talk about that made me think of something else, which may sound crazy. I like that, right? So I can imagine there being a need for having a crossplane, managers crossplane, right? Updates, right?

Starting point is 02:04:58 Because you have a crossplane instance that keeps all these other cross-place instances up to date, maybe, or the application's up to date, but maybe I think there will be something else which will keep the application because you have the bigger loops, which reconcile maybe less frequently. And then you keep going in and in and in until you have some very quick loops, which reconcile every five seconds, 10 seconds, or whatever. Is this something that you've thought about or did it come up before? Yeah, that is not as crazy of an idea as you would think,

Starting point is 02:05:33 or maybe we're also crazy too, but either way, it's a positive idea. That's definitely true regardless. We can go with that. That's fine. But if you think about the architecture in general in Kubernetes around controllers that are performing active reconciliation i mean it's a great pattern um you know it's an old pattern too you know it's it's commonly used in you know robotics let's say to run in a control loop and sit there watch the uh actual state in

Starting point is 02:06:00 the environment and compare that to the desired state and then make the see what the delta is there and take an operational step towards you know minimizing that delta there between actual and desired and so the same exact example there that you brought up of a cross plane to manage cross planes uh that's entirely within the realm of reason of you know it's a set of controllers that can watch the environments and make changes to it to continue to drive it so if there's a new update to cross plane you know you can you know the single control plane, you could be able to watch that see that there's an update, you know, take the imperative steps within this controllers reconciliation loop to upgrade the application and get it to the newest version. But it's all it's

Starting point is 02:06:39 all just the operator pattern and controller patterns inside of Kubernetes. And you can use that to manage basically any resource. And so I think it's a good idea to be able to manage cross planes and be able to, because if you think about it, not everyone's going to want to run and manage their own cross plane. And so I think that there's definitely value in being able to automate that and take some of that effort away from people and let the controllers and the machines do that for you so that you can have a cross plane instance that's hosted for you as a service and be able to get all the benefits out of your Crossplane without having to manage it yourself.

Starting point is 02:07:12 Let the software do that for you. And I think there's definitely value in that that we see for sure. Okay. So this, in my mind, set us on a path that requires me to ask the next question, which is what big things do you have on the horizon that you can share? Yeah, I think scheduling is one area that we're looking forward to designing and approaching.

Starting point is 02:07:37 So when you have these Kubernetes application workloads, the concept that was raised earlier of bundling your application and its managed resources as a sort of single component, you're going to need some sort of way to describe where to run that application. What cluster should it be run on? Which managed service should it be using. So currently, the way that these abstract types, these MySQL instances, these Kubernetes clusters, currently the way that they resolve is through label selectors. So you've described a class, named that class and set some set of options on that class. But right now you're referencing it by name. And so an area that we'd like to figure out is how we can do that dynamically.

Starting point is 02:08:32 So scheduling it based on perhaps cost, perhaps based on the region, the locality, the affinity to another workload. There's all sorts of areas that we can really go into there. Maybe the performance of a cluster or an application is sort of failing, and so that could lead to an application being bound to another application in some sense. So lots of layers of abstraction here and lots of fuzzy decision making that can really provide a better application deployment experience. And building on what Marcus is saying there is that if you take a look at what the scheduler

Starting point is 02:09:20 does inside of a Kubernetes cluster, the in- know, in cluster scheduler, its job is to figure out, its job is to know about the topology of the cluster, know about the resources that are available in the cluster, and then make the best decisions about where a pod should be scheduled to, where it should run based on, you know, is that node overloaded? Or do I need to evict some pods somewhere? Or does it match the particular hardware resources that are available on a particular node? So then if you take that idea of Kubernetes as a control plane, figuring out where pods should run across nodes in a cluster, and then go a higher level where you have something like crossplane, which is a control plane that's spanning across multiple

Starting point is 02:10:02 clouds, multiple clusters, on-premises environments. But it's a higher level that is aware of the topology of all the resources that are available and then can make these smart scheduling decisions about where should an application run based on whatever constraints it thinks is most important. So this whole idea of scheduling that was done in cluster for Kubernetes can definitely be raised up like Marcus was talking about to make decisions more at a global scale. That's really cool. I'm really looking forward to what's going to come out of this because it's super exciting. And I know that, you know, different providers and different teams are tackling this in their own specific way. So whoever gets there first, or even if it's like multiples, it'll be a great moment because it will open up other possibilities, right?

Starting point is 02:10:46 And it's all building blocks, next steps, next steps. This is really, really exciting. So as we are approaching towards the end of this great discussion, which I'm sure we can continue, one thing which I'd like to mention is that the way I got to learn about crossplane is via your youtube live streams the tbss i believe and and and dan was was the last one that i've seen i think on the last stream and uh it's it was great to see that in action uh so uh can you tell us

Starting point is 02:11:23 more about how that works where where the idea came from, how it feels to be on the other side? Absolutely. So if anyone out there wants to go watch some very low quality videos, I disagree. We do a live stream every two weeks. And that's something that we got ramped up shortly after I joined Upbound. And it's really just a time. It's very informal. And it's a time for us to talk about new things in the cross-plane community, new things in Kubernetes that are related. And then also to do a lot of really live demoing.

Starting point is 02:11:59 And actually someone asked me today, you know, why don't you just you know, just record your demos and just post them on there and then you can make sure that that everything goes smoothly and that sort of thing. And the reason we don't do that is because we think there's a ton of value in messing up, right? There's a lot of different configuration that can happen when you're provisioning things across cloud providers on prem, lots of different services, lots of different plugins. There's a lot of different ways you can mess up, which is not really a reflection of the system or even of your own ability it's just complicated and what it does when you provision things and you run into issues with it and you work through it is it shows people how to troubleshoot when they run into those same issues

Starting point is 02:12:39 it also adds a layer of humanity to it I think that allows people who are tuning in especially live when they're dropping comments and that sort of thing, to be able to talk about what their individual experiences are. I like to say we've had some other people host as well on some episodes. We actually recently had multiple people hosted a single episode, which you might want to skip that one. There was some technical difficulties. I apologize. I'm not a visual engineer.

Starting point is 02:13:06 But what I like to encourage people to do is, you know, talk about something they're interested in outside of Crossplane. So a lot of times I'll start a show by talking about the Utah Jazz, which is a basketball team I really love. And I'll encourage other people to do the same because, you know, it comes down to it. The end users of Crossplane and the people that build Crossplane are going to have to be really closely integrated, right? Because it is a platform that is going to inherently have to make some architectural decisions. And we want to be best informed about how users want to use the platform so that we can build it to meet those specifications and then encourage them to come in and build parts of it as well. So I think just building that community and having fun and talking about, you know, you

Starting point is 02:13:48 can do all these things and we're excited about them and we'd like for you to come join us on this journey. I think that's really the purpose of TBS, which is the binding status, which is kind of a play on, you know, claims binding to classes. I think that's the purpose of the show. And we had a couple people come up and mention that they'd watched episodes which i was uh astounded by and i apologize for the the time that they had wasted but uh it was personally and as an organization really validating to say you know what people care about what's going on here and uh they feel

Starting point is 02:14:21 welcome into the community by this style of communication. So there's one big downside to this, from my perspective, is that I enjoy watching the shows more than trying cross-playing out. So the risk there is that I will continue watching all the cross-played shows forever and never try cross-playing because it's so exciting to watch that I spend all the time watching rather than trying it out. So that's one of the real risks of this. Well, I think the solution to that is we just have to have you come on and host and then you'll be forced to try it out.

Starting point is 02:14:55 Oh man! With hundreds of people watching! Just put you in the hot seat. Right, yeah. A forcing function. Yeah, that's actually a great idea, I have to say. I don't know how I'll get out of that one. Any last parting thoughts?

Starting point is 02:15:10 Well, it's really easy to try it out, so you don't have an excuse. You just help them install it. And as long as you've got some cluster somewhere, install it in Kind or install it in K3S on your laptop. Docker on Mac includes Kubernetes engine now. somewhere, install it in Kind or install it in K3S on your laptop. Docker on Mac includes Kubernetes engine now. So from

Starting point is 02:15:29 there, you can help install your crossplane and from there start provisioning more clusters, more managed resources, the Kubernetes applications. And another piece I'd like to piggyback off the idea of the videos is that we have a lot of documentation.

Starting point is 02:15:46 We've worked hard to update this documentation, both on how to build stacks and how to use Crossplane. We've been updating it every version. And we're trying to get more strict about making sure that our docs are updated with every release. And we've been releasing the product faster and faster. The last release was 0.5, and before that was our first minor patch in 0.4.1. We've worked on our build pipeline so that we can get the updates out there quicker. So with all of this, you have documentation to test it out with. And I'd like to say that, yes, the video is probably one easy way to consume it. So for different people, different things are going to work. Whether it's reading the docs, whether it's installing the product and just

Starting point is 02:16:39 trying it out by hand, or whether it's watching us fumble at the kubectl command line. YAML is not the easiest thing to just crock at a distance. Sometimes you need to watch somebody stumble over how to best describe it or just read thoroughly what we've done or jump in the code. Visit the GitHub project, star it. That stuff is really useful to us. Leave issues for any kind of ideas that you would like to see Crossplane expand or delve into. And a closing thought on that, that I strongly believe in is that I consistently see that some of the best feedback and ideas for a project comes

Starting point is 02:17:19 from brand new users that have never seen it before. Because, you know, you could be, you know, a project maintainer, let's say, and you're consistently living in that code base and you know all the ins and outs and the idiosyncrasies of it. And you kind of get, you know, a very specific, you know, myopic view on it almost. But then you have a brand new person try it out for the first time with fresh eyes. And they see something immediately that you've been completely blind to for the past six months. So some of the best feedback comes from brand new users.

Starting point is 02:17:42 So we are super open to new people trying it out and giving us their ideas because they're probably going to be good ideas as well. Okay. So on that note, I really like that idea. How about we stop the interview now and I can start trying some cross-plane stuff out for the first time. You can watch me and tell me all the things that I'm doing wrong. I'd really like that. Or maybe you can tell us what we've been doing wrong. Or that, yes. This will get crazy. I'm really looking forward to that. Dan, thank you very much.

Starting point is 02:18:11 Marcus, thank you very much. Jared, thank you very much. It was a pleasure having you. I'm so excited that you were on the show and I'm looking forward to what will happen next. Thank you so much for having me. Thank you. It was a pleasure.

Starting point is 02:18:22 Yeah, we really love ChangeLog. Love all the shows. Go time. Just subscribe to the Master Feed. You get us. It was a pleasure. Yeah, we really love ChangeLog. Love all the shows. Go time. Subscribe to the Master feed. You get everything. It's the best. Thank you, Marcus. Thank you.

Starting point is 02:18:30 Thank you. All right. Thank you for tuning in to the ChangeLog. You heard Marcus. Subscribe to Master. It's our majestic monolith. Get this show, brain science, founders talk, and everything we produce all in one place. You've got nothing to lose.

Starting point is 02:18:46 Special thanks to our friends at the CNCF for making this series possible, and to Gerhard LeZou for conducting these awesome interviews. Our music is produced by The Beat Freak, Breakmaster Cylinder, and we are sponsored by some amazing companies. Support them, they support us. You know Fastly, Robar, and Linode have our back. Thanks to them. Thanks for listening.

Starting point is 02:19:04 We'll talk to you in the next decade. Thank you. Субтитры добавил DimaTorzok

Your Ad Here

The Changelog: Software Development, Open Source - Gerhard goes to KubeCon (part 2) (Interview)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.