PurePerformance - OpenTelemetry from a Contributors perspective with Daniel Dyla and Armin Ruech

Starting point is 00:00:00 It's time for Pure Performance! Get your stopwatches ready, it's time for Pure Performance with Andy Grabner and Brian Wilson. Hello and welcome to another episode of Pure Performance, everybody. My name is Brian Wilson and as always is my trusty sidekick, my lead, my partner, my co-captain, my everything, Andy Grabner. Andy, how are you doing today? Did you just captain? You know that it's always me when I go captain. captain yeah yeah no I was always you get we always give

Starting point is 00:00:50 me crap when I say captain I was not saying I'm saying in terms of like a captain of a ship and we're you know okay we're two ships sailing in the distance or whatever in the Sun I don't know there's some stupid saying it's a little bit romantic there but we won't get into anything. We don't want to shy away. This is not a romantic episode. Do we have, we never had, I don't think we've ever had any hint of romance in the episodes, like not even a Valentine's day episode.

Starting point is 00:01:20 I think today we'll talk about the romance between maybe logs, traces and events coming together on the one roof. You're so good at that, Andy. You're so good at the segues. It's amazing. Absolutely amazing. Ladies and gentlemen, I present Andy Grabner. Thank you. And I was just saying my first company I worked for was actually called Segway Software. We built a performer. So I think Segway, Segway is just kind of in my nature. But let's stop with the introduction and talking about ourselves. Today's topic is open telemetry.

Starting point is 00:01:54 I mean, open telemetry is obviously the role. I liked it. So Armin, this is a perfect way to introduce yourself because you just brought up an even better way of talking. But how can we bring romance into this? Armin you go right away what did you say? Sure yeah you brought up romance and I thought of the marriage of open senses and open tracing and that brings us to open telemetry today's topic. Exactly and And Armin, as you already kind of took over, do you want to maybe start actually with introducing yourself and then we also bring Dan in because I know we actually have two guests today. Sure. So my name is Armin. I'm a team lead

Starting point is 00:02:36 and software engineer at Dynatrace. And as part of that, I contribute a lot to OpenTelemetry where I'm on the technical committee. I started at the company a couple of years ago, at first on the one agent, later on the one agent SDK, which is kind of a proprietary way of extending the agent. And then the natural segue to move on was to go to the OpenTelemetry project and contribute there. Great. And first of all, thanks again for the great intro on kind of segue on OpenTelemetry. OpenTelemetry is a big topic for us. That's why we wanted to make sure we actually get people on the show that not just talk about it in theory, but actually are actively involved

Starting point is 00:03:23 in that important project. And now let's also open up the stage for Dan, Daniel Diler. Hello, welcome to the show. You may introduce yourself quickly. Yeah. Hey there. I'm Dan Diler. I also work for Dynatrace.

Starting point is 00:03:39 I'm an open source architect on Armin's team, on the open source team. I'm primarily working on open telemetry these days, but I also work on the W3C Tracing, Distributed Tracing Working Group, among some other things. And I guess we'll get more into what OpenTelemetry is and what it means later, but I'm on the governance committee there and I also maintain the JavaScript client. Can you quickly fill us in? Because there's a lot of things it seems you do in your day-to-day job. Can you tell us how much time you actually spend in, let's say, open source project governance things like administrative parts, I don't know, being in meetings, in sync meetings, working on Git issues. How much time can you kind of say this? How much time do you spend in the, in sync meetings, working on Git issues, how much time can you kind of say this,

Starting point is 00:04:26 how much time do you spend in the, what I would say administrative part, which is very important or the community driven part versus the actual implementation? Yeah, I guess it's kind of transitioning more and more to the community aspect of it every day. When I first started working on OpenTelemetry, I would say I was coding 80 percent and doing community management 20 percent and now it's almost the opposite. Now it's more like 20 percent coding and 80 percent community management. By that I mean not just stuff like this, but pull request reviews, issue triage, governance topics,

Starting point is 00:05:09 running the meetings, stewarding new members and stuff like that. But this also means then, if I hear this correctly, you obviously have more people contributing and obviously these people then are contributing through pull requests, which need to be reviewed. And so the community itself, the contributing community is growing. And they need some guidance. Yeah, that's great. Yeah, of course. The community is growing.

Starting point is 00:05:33 And the amount of code that I do as a percentage of my time has certainly gone down. But that doesn't mean that the project itself is getting less. When I first started on it, it was really only me and a couple other people on the JavaScript side of things, and it has really grown. So as it grows, it's obviously good for the project, but it means that there's a lot more administrative overhead, I guess you would call it. Yeah. Armin, is there anything else from your end that you would like to add, some from your perspective in the groups that you are actively involved in? So I'm mostly active in the specification, so less coding is involved there.

Starting point is 00:06:14 The only code we touch there is Markdown and some things with the build tooling to make the spec paths and so on. But in the specification, it's a lot of issues and PRs going on there. And as being part of the TC, that also involves some housekeeping around things like GitHub repo organizations, community membership and those kind of things. And it's really, thank you so much for all of this work, because I think this is often from the outset, at least, I think missed that this is very important work,

Starting point is 00:06:51 because with this work, nothing would actually move along. Nothing would be coordinated. And it would be coordinated, and the output wouldn't be what it is. And therefore, thank you for also giving us some insights into what it actually takes to run open source projects successfully. So I think that the governance part is often underestimated the amount of time that will go into that because people think open source is easy. Someone else will write the code for

Starting point is 00:07:19 you on your behalf and you don't have to do anything right um but luckily that that naive approach uh that naive approach is not what what experienced people think of um but some might um and at first i also thought that it was less less governance and less community work um and more progress moving on but in every issue and every pr you have a lot of people with a lot of opinions on it and so in the spec things naturally move a bit bit slower so in the open telemetry specification that is you know looking at this what we just talked about there's a lot of pull requests a lot of work a lot of coordination work going, a lot of making sure people can actually become active and be productive.

Starting point is 00:08:08 What are the other metrics, KPIs, that you guys are looking at to measure the actual output and also compare a little bit how other open source projects are running? Just to give an example, maybe looking at the number of pull requests that are open versus how long do they take. Are there any metrics that we should

Starting point is 00:08:28 be aware of what open source communities actually are looking for? Because when I think about this, a lot of new people are entering the open source space and they want to contribute. And there's 1,000 projects to choose from. Are there any good health indicators of, hey, this is actually

Starting point is 00:08:45 a great run project, which I assume yours are? Yeah. So I guess since I'm the governance committee member, I should take that one. The things that we primarily look at are how long from when an issue is open to uh the first contact from a maintainer and then until it's closed uh and then same for pull requests uh time to the first review and then time to merge or close um it is part of the mandate of the governance committee to look at those things periodically um honestly i wish we looked at them more often. But at this point, I would say we look at it quarterly or twice a year, give or take. And from there, we can decide where identifying the shortcomings is the first step to fixing them, right? So we had, for instance, a year ago, we were having a problem with, I guess, what you would

Starting point is 00:09:50 call throughput. We had a lot of pull requests being opened that would just sit for a really long time. And as the governance committee, it's our job to come up with processes and such that lower that time, at least until someone's contacted. Because as a first contributor, especially first contributor, it's a terrible experience to open an issue or a pull request, and then it just sits for a month and nobody has responded to it. And you just feel ignored, especially when you can see that there are things going on in the project. So you know, there are people working on this. Why are they not interacting with my pull request? And a lot of that comes from just dynamics within the project that you have in most of the what we call special interest groups that maintain a component. You have like a core group of people that contribute regularly.

Starting point is 00:10:51 And you have like, you know, maybe five or six people that are used to seeing each other. They're used to seeing pull requests from each other and stuff like that. And then when someone new comes in, it's just not a part of their regular day-to-day flow. So as the governance committee, we try to see what we can do to bridge that gap a little bit. It's interesting because when you think about it, you're all volunteering, basically. And there's no, I mean, obviously people are taking lead roles and trying to help push this stuff. But number one, it's a matter of how much time can you put into that. And also for anybody who is trying to make improvements, it becomes a lot more challenging in an open source project, I would imagine, to affect change when you don't have the implicit, let's call it authority behind you of being a

Starting point is 00:11:47 hired manager at a company, right? So I imagine there's a lot more diplomacy involved and just even trying to get people to move along. It's a passion project. And Andy, you faced a lot of that within Captain, I think, right? In terms of how do we get it moving along how do we keep the the the enthusiasm and and uh the um the momentum moving so it's uh yes i i think probably thinks most people probably haven't thought of when it comes to open source projects right the real daily running of it so it's interesting to hear this perspective and thank you for sharing that. Yeah. Of course, ideally it's a meritocracy, right? Like whoever is, is doing the best work should, should be, you know, moved up or promoted or whatever it is that you call it. But, uh, the reality is that there is a lot of diplomacy involved and stuff like that. And even if someone is, you know, a genius contributor,

Starting point is 00:12:39 they come in and they make their first PR and nobody knows that they're a genius. So there is definitely like a, a progression that you have to make, make, you know, for your first PRs and such. And most of our like TC and GC members have been around with the project for a really long time. And there are obviously disadvantages to that in terms of getting new contributors in, but there's also advantages. The people that have been around for a really long time are more familiar with the vision of the project and the long-term roadmap and where we see it going and stuff like that, which prevents us from having too many course corrections along the way.

Starting point is 00:13:23 It's a good thing that everybody is moving towards the same goal. And maybe the last question on this before I want to give you the chance to also talk a little bit more about OpenTelemetry. But the last question is, there are clearly people that just come in from anywhere in the world and try to contribute. However, the two of you employed by Dynatrace, you are dedicated on these projects, which means in open source projects

Starting point is 00:13:51 like OpenTelemetry and others, we have organizations that see the value in it and actually contribute engineering resources full time to really drive these things forward. Does this help? I assume so, right? At least it helps to really move things along in the direction that the uh also some of the vendors like we are

Starting point is 00:14:10 really also wanted but you know always in collaboration obviously with the bigger community uh yeah um so i mean i think everybody has this idea of open source projects as being done by volunteers. And in a lot of cases, that's true. But in most cases, I would say, at least for larger projects, you tend to find that most of the people working on the project are working for some company that is interested in moving the project forward. And as a company, there's several ways you can contribute to a project. One way is by giving money. But in a lot of cases, that doesn't really help as much as contributing engineering resources and product management resources and things like that. Money contributions are never

Starting point is 00:15:05 discouraged, but a lot of times they're not the most effective way to help. So you do find a lot of organizations donating engineering resources and things like that. And that's why a lot of the contributors tend to be from organizations. But that said, we definitely do have quite a few contributors that are working in their free time, whether they are just open source, passionate about open source, or whether they are open telemetry users

Starting point is 00:15:38 and their company just doesn't have time to dedicate their own time, but they identify shortcomings and they say, as a user, I want this to be as good as it can possibly be, and things like that. But yeah, the bulk of the contributions do tend to come from organizational resources. Cool.

Starting point is 00:15:59 Well, thanks, both of you, for giving us the insights from the outside in especially to understand what's actually happening when running open source projects. Now, open telemetry, right? The romance between open census and open tracing. I think I need to write this. This is going to become part of a child.

Starting point is 00:16:17 I love child, yeah. Brian and I, we had guests in the past talking about open telemetry, but I think as this project is moving so fast, a lot of things have changed. And therefore, it would really be great for those that listened to our previous episodes, but also those that might be completely new, just to give everybody that is listening an update. Kind of OpenTelemetry from a high-level perspective, what is it? Who is supporting it who is the beneficiary of of it how does it look like if i want to use it and so i would like to pass it over

Starting point is 00:16:51 to either of you to give me a little overview and our listeners obviously about open telemetry so the general high level um concept of it um luckily has not changed since your last episode. It still fulfills the same purpose and it can be described as an API and SDK and other tooling and integrations that are designed for collecting telemetry, processing it and then for exporting it to certain data sinks that can analyze this data. And the beneficiaries are all people that have anything running in production because on the long run the vision would be that everything out there would come instrumented with ideally open telemetry out of the box so that you can just magically monitor everything that you deploy in your environments with

Starting point is 00:17:54 it. I was just going to say, I think the key differentiator, not the differentiator, but one thing to highlight for people new to this concept is this is the collection and exporting of this data it's not the processing or consumption that you still need either a vendor tool like ours to do or another third-party analysis tool to ingest all this data and make sense of it correct the open telemetry piece is the collection and exporting of that data is that a fair fair assessment um yeah it is so there are um there are tools um to to work with that data to analyze it um on in open source there is jaeger for example that can handle traces and obviously vendors

Starting point is 00:18:42 that that have their own backends to import such data. But the OpenTelemetry itself is about collecting and managing and processing or pre-processing the data. And the way I see it, and Brian, I'm sure you see it too, the reason why organizations are really looking forward to OpenTelemetry, at least one of the arguments i hear is people want to become kind of independent of any vendor they want they don't want to build in something uh proprietary and everybody wants to be i think this is also why kubernetes at least in theory is is is great because kubernetes itself is open whether you run it on premise or whether you run it in one of the cloud vendors in their managed offering it doesn't matter you can always move around and you kind of really get this hybrid

Starting point is 00:19:28 cloud becoming a reality and you can just run your containers anywhere and i guess with open telemetry would be the same as long as everything is instrumented with open telemetry you get your metrics you get your logs you get your traces and then you can obviously then switch backend tools as you said you can switch from an open source to commercial you can your traces, and then you can obviously then switch back-end tools, as you said. You can switch from an open source to a commercial. You can switch around, or you can let people decide on what they want to do with the data. One issue, though, I have when people talk about that idea of vendor lock-in, and I know this is probably a bit of a hot topic in some circles, is that if we're talking about everything except for serverless, right? Or some of the brand new stuff, there really is no vendor lock-in because this is not 1997 or 2005 or whatever

Starting point is 00:20:13 year you want to go back to where the only way to instrument the code and collect this data is to build stuff and write stuff into your code. You know, all of the vendors are doing it dynamically. So there really is no lock-in. You're just dynamically capturing the data with one agent versus another. But whenever vendor lock-in is brought up, it's always done in this boogeyman style of, oh, you're going to have to instrument your code. That's like such an old-fashioned view. It just kills me. Obviously, when we start going into things like serverless, where there are no agents and things like that, this becomes a reality, right? But most people are talking about open telemetry in combination

Starting point is 00:20:48 with kubernetes which is still talking about jvms containers and everything and i just wanted to get out that off my chest because it really bothers me when i hear the whole vendor lock inside i see the point they're making but they're using like an ancient argument against it which bothers me well maybe you guys have a different view on that because obviously I'm only getting it from my perspective. Yeah, of course. I would say that there are high-quality agents that do instrument automatically and stuff like that.

Starting point is 00:21:19 And in a lot of cases, they work really well. But there's always corner cases and edge cases. Like serverless is one that you brought up, but cases they work really well. But there's always like corner cases and edge cases, like serverless is one that you brought up, but there are others as well, where automatically instrumenting applications either misses some key component that's maybe not missed in a technical perspective, but like is important to your particular business case, right?

Starting point is 00:21:46 Like a metric isn't captured that is important to your business, but is maybe not common from a technical perspective. Or maybe some API is particularly difficult to observe from the outside. What really makes open telemetry different, and I think what the core value proposition of it is that because it's open source, you could have libraries and open source applications and stuff like that that build it in as first-party support. So when you're looking from the outside, you can, of course, capture a lot of things. But if you're the developer of a library,

Starting point is 00:22:31 you're always going to know what's important at a much deeper level than could ever be done by an outside agent or something like that. So if you have a database driver, for example, that builds in first party support for open telemetry, the data is most likely more reliable, more stable, and more resistant to changes and updates in the library. For example, if a new version of a database driver is released and some outside instrumentation breaks, whereas a first-party built-in instrumentation

Starting point is 00:23:11 would be considered as a part of that update. Yeah, those are fantastic arguments for open telemetry. I'm not against any of that. I just don't see the vendor lock-in side. But that's more a feature function and benefit and more out of the box but anyway maybe to the vendor lock and i just have one comment of this where i can also see it valid let's say you have one apm tool and it provides you let's say 10 metrics and you need them and then you're switching to another tool and you don't get the

Starting point is 00:23:40 same metrics it means you're kind of locked in. Because in order to get out of it, you need to figure out how to get this data that you always had in the other tool with the new tool. If you only base everything on OpenTelemetry, you don't care about the tool because you know that you always get the data, however, whatever backend system you have. I think this is one of the arguments with the lock-in

Starting point is 00:24:04 that I could understand and I see. I got a question now to both of you. I think the way now that I see it, if I'm looking at it again from a very naive perspective, I have the option that developers instrument their code and we encourage them to build libraries or their own applications because they know what's best then i can also see that if i am a developer and i'm using third-party libraries hopefully somebody instrumented it but what about uh is what about the status of dynamic instrumentation because i believe that already exists at least for some technologies so we have

Starting point is 00:24:42 some automated agents in OpenTelemetry that also inject OpenTelemetry. Can any one of you fill us in here? Because maybe I don't want to instrument my code manually. And I think that option also exists. And if you're not the experts, then feel free to say this, but at least I think there are some auto instrumentation options

Starting point is 00:25:03 available. Just been interested to hear an update. Yeah, I think everybody kind of has a different definition of what auto instrumentation means. And it sounds like what you're referring to is something where I don't have to modify my code at all, right? And I just, like, it's automatically injected. I think, and maybe Armin can correct me if I'm wrong here, I think Java has some concept of this in OpenTelemetry and possibly.NET, but it's honestly not a very mature story there in OpenTelemetry yet. For the most part, in most of the language implementations, you do need to do some level of manual work.

Starting point is 00:25:46 Most of the time what that means is during my application startup, I need to configure OpenTelemetry and configure instrumentations and stuff like that. But I don't need to then go and modify every single class in my application. For example, in JavaScript, I can enable the HTTP instrumentation and the Express instrumentation,

Starting point is 00:26:12 and then anywhere where I make or receive an HTTP call and my Express application and router and all that are all automatically instrumented. It's only like five lines of code to get all of that. So it's semi-automatic, I guess you would call it. But from a developer perspective, this feels automated, right? I don't need to go into all of my,

Starting point is 00:26:37 into my code and find the places. So, yeah. Yeah, at least you get some go-gons. You probably have more, better things to say than I do. I was just going to say, of course, we want it to be as automatic as possible. Ideally, you wouldn't need to change your code at all, configure anything, and we would just figure it out. But as the project matures, it'll hopefully get closer to that ideal scenario, but we're just not quite there yet. Yeah, the Java agent is certainly one of the closest there because the JVM has built-in support or built-in hooks for monitoring agents.

Starting point is 00:27:15 And so you don't even have to add a line and then recompile the entire app again, but you can just pass the auto-instrumentation agent on the startup of the Java virtual machine, and then it wires up things automatically. And that's quite close to fully automatic instrumentation already. And I wanted to ask when it comes to, harking back slightly to vendor lock-in,

Starting point is 00:27:49 when we talk about languages like Python and Ruby, where for the most part that is SDK work, that is manually instrumenting your code, with this concept of automatic instrumentation, do you see anything on the horizon where it's, is it either just run this batch script against your code and it's going to instrument the common things, or is there going to be some function built into some languages like that where it's unmanaged code that it's at least going to be able to pick up the ingresses and egresses and complete a span

Starting point is 00:28:23 through that? What are those, you are those non-managed code bases look like on the landscape of automatic instrumentation? Well, I mean, on an infinite time scale and assuming the success of the project and all of that, we would hope that it's built in as many places as possible. One really great example of this is actually.NET, which has built in first party support for open telemetry directly into the runtime. So far, that's the only place where that's true.

Starting point is 00:28:57 And it is really cool. It kind of does feel like magic when everything works the way that it's supposed to. And the more success the project has and the more users it has, the more incentivized library developers and such will be to build it in. And then once you attain some critical mass of libraries that have built in support, it then makes sense for the runtime maintainers to look and say, our community, this is clearly something they care about, and maybe we can build in, if not automatic support,

Starting point is 00:29:31 then at least some features that enable it to be used more easily and stuff like that. I think this is all really long time-scale stuff. None of it's coming next year necessarily, but that is the long-term vision of the project. I would like to loop Armin a little bit into the loop. Armin, do we miss anything here when we talk about, because we initially, when we prepared for the talk, we wanted to make sure we give a good overview of where OpenTelemetry stands, where it came from. Anything else we miss that the audience should understand?

Starting point is 00:30:09 Maybe also where it came from so that we understand where we are. Sure. So to the history of OpenTelemetry, there were two competing open source vendor agnostic instrumentation libraries one being open tracing and one being open sensors also with their with their upsides and downsides and then i think i think open tracing was first published in 2016 and open sensorsensus two years later, like 2018. And then in 2019, that's also when we came to the project, they were merged together and joined forces on OpenTelemetry. That's where we are right now.

Starting point is 00:31:03 Yeah. And is there still, I mean, just out of, because I don't know, is there still some OpenTracing and OpenCensus out there because somebody put it in back then and they're still maintaining it somehow, or is this all converted over? No, it's not, not converted over everywhere yet. So there will certainly be quite a few deployments with OpenTracing out there. But what OpenTelemetry provides and calls shims is some backwards compatibility support

Starting point is 00:31:33 with both OpenTracing and OpenSensors to some extent. And one question that I have, because at least Brian and I, we've been doing this instrumentation with our product for a while and I remember in the early days we always had the challenge to not misuse the power of instrumentation which means in the old days we had the option to configure rules to instrument every method in a certain package, right? And we call it the shotgun instrumentation, which meant we were just collecting a lot of information that nobody really needed

Starting point is 00:32:12 unless you were to develop on the local machine and really wanted to have everything, but then it broke everything later on. Are there, from your community work, are there best practices or are there any approaches how you can make sure that as a developer, I'm not instrumenting too many things? Because too much of something doesn't help anybody either. But if it becomes either overhead or also if you're capturing things that might even be information that is confidential, is there any best practices or any discussion in this area in the community?

Starting point is 00:32:46 So if you're instrumenting things manually, if you're instrumenting your application or your library, then you will best know what kind of information is interesting and relevant. So you can identify the operations that you want to track and then only cover these. So that's probably less of an issue. And for instrumentations that are provided, for example, by OpenTelemetry as separate instrumentation libraries,

Starting point is 00:33:18 that is also taken care of. So when an HTTP client library is instrumented or an HTTP server library, then of course the focus is on the requests there and less on the internals. So the granularity there should be manageable. And are you just aware, because Brian and I were both performance enthusiasts, do you know if, especially for those libraries that are then heavily used in all sorts of applications, is there any performance testing that

Starting point is 00:33:50 is done instrumented versus un-instrumented? And just in case you happen to know if there's anything happening there. I've seen some benchmarks in their respective language implementation repositories being performed. But I don't know if that's employed everywhere. Yeah, I'm curious. I don't know if this is under any of your realms of expertise in this area,

Starting point is 00:34:18 but Andy, you mentioned exposing data. So I'm curious as to what the OpenTelemetry security thought process is, right? If either in an OpenTelemetry project, I can possibly do this, or as a vendor, I might sneak into my code something that's going to expose something. Is there any sort of security review when these libraries maybe come through or is it just hey I take our word for it it's good like what's what's that risk that you might see I mean obviously there's risk with anything you use vendor or not is there any I don't know quite the word but any any approach to that within the open

Starting point is 00:35:02 telemetry project? So the OpenTelemetry specification defines, let's say, a data model on how the collected telemetry should look like. We call these the semantic conventions. And they were written with that in mind so that any potentially sensitive data should not be captured. But of course, you can encode, I don't know, credit card numbers and passwords in your URL query parameters or whatnot.

Starting point is 00:35:41 So those have to be filled with care. But there is no inherent way of prohibiting or preventing any such sensitive data from being exposed in open telemetry. That's something that the respective backends are to take care of. So you can send your data to a back-end and then it's up to the back-end to make sure that only people who should have access to such data do have access the point of collecting it. So when you instrument your code, you would say I want to add this attribute to my trace or to my span and I define this one as being sensitive, but that's in an early stage and just ideas floating around there. Cool. Guys, this now goes to both of you and whoever wants to answer first.

Starting point is 00:36:56 I know you've been both involved in that project for a while now and we kind of learned now, again, the history and where we are. But what's more interesting not more but also very interesting is what's happening now and what's what are you guys excited about in what's upcoming what are the new cool things or the things that are the innovation that is happening the maybe the new features that are coming in let's say in the next three six to 12 months that will be beneficial for the community that the community has been asking for?

Starting point is 00:37:28 And whoever wants to go first. Sure. So what's currently going on is that the metric specification was in large chunks rewritten. So we know that OpenTelemetry consists of three pillars, one being traces, one being metrics, and one being logs. Traces have been out for a while now, and now it was time to finish up the metrics back. It was actually just marked as stable last week. So that's part of the OpenTelemetry feature lifecycle.

Starting point is 00:38:07 Now, all of the language SDK implementations are catching up with the latest specification. In the upcoming weeks, we should already see the first OpenTelemetry SDKs with a stable metric support being published. And that's nice to see. It took us quite a while, so it was underestimated at first how long it will take. So the roadmap that was anticipated initially would have reached some stability in the metric space in

Starting point is 00:38:50 2021 already towards the end of the year, but that was not achievable. But now we are there and no road bumps ahead that we would be aware of at this point. Yeah, I think when we first started working on metrics, everybody looked at the community of contributors and said, we have a lot of really smart people from a lot of companies with a lot of experience, with a lot of good ideas, and everybody generally knows what they're doing and we should get this banged out pretty quick. And then what happened was when we started actually working on it, we had a lot of really

Starting point is 00:39:29 smart people with a lot of really good ideas. And there ended up being a lot of debate about which good ideas should be, you know, because some of them, you can have two ways to solve a problem that are both good ways but have trade-offs. And a lot of times those debates just take longer than you expect because both sides have merit. It's much easier to settle a debate when one side is clearly wrong or one side is clearly way better. But when you have, honestly, a really smart community that cares a lot, there are a lot of good solutions. And that can be as much of a cause of delays as mistakes that you've made.

Starting point is 00:40:16 Yeah. It's the benefit and the pros and the cons of a democracy. Yeah, we lovingly refer to it as the open source tax. Yeah. And any future work? I mean, anything else we should highlight that's beyond? It's great, obviously, that the metrics API is now, the spec is stable and we see the SDKs.

Starting point is 00:40:44 Anything else people, if we, if we would speak again, and let's say three to six months, and then you would say, yes, and this is not here because we're getting close to this, but what else is coming in the future? Yeah. So of course, Armin already said, the metric stability is a huge thing that's happening. Like literally right now, we were just in a meeting talking about it. And in the next three months, six months, year, we're going to see more development of metrics. There were some features that were left out of the initial interest of stability and timelines that we do want to finish and get out.

Starting point is 00:41:30 So one example of that might be exemplars and metrics, which is a way of connecting your metrics and traces together so that your metrics are, I guess, what you would call tracing aware, which we're really excited about. Potentially, like a hints API, which would be if I'm a library developer and I'm instrumenting my own code, I can provide a hint to the way that the SDK may be configured when the defaults maybe won't work for this use case. a hint to the way that the SDK may be configured, when the defaults maybe won't work for this use case,

Starting point is 00:42:08 I can suggest different defaults. But then as the final end user, they can always override that hint. So that's why we call it the Hints API. Then similarly to making metrics tracing aware, there's been some work to do the same with logs, to apply trace and span IDs to logs in order to make them also tracing aware.

Starting point is 00:42:36 I guess, again, talking more about long-term timelines, the goal of the project is to not have separate traces and metrics and logs, but to have one interconnected stream of data. A smart scheme, if you will. Yeah, exactly. Armin, something you wanted to add?

Starting point is 00:43:06 One thing that will be added to tracing, for example, is also the consistent trace sampling. So consistent sampling of distributed traces across multiple processes or multiple realms with potentially differing sampling rates. That is also something that's currently in an experimental state and being prototyped. Right, and if somebody wants to see

Starting point is 00:43:36 all the things in action, now our colleague, Henry Grexet, he has his YouTube channel, Is It Observable? And he recently did some sessions actually on open telemetry where he also explains first of all visually very nice but then he also has some great tutorials so folks if you want to see what open telemetry really looks like and how you can get this done on a sample app just check out is it observable as well armand daniel i know you are both obviously relying on on people that actually contribute to the project and i think a community needs to grow and therefore i want to give you also the chance to say what and where should people go to in case

Starting point is 00:44:22 they now listen to us and say hey this sounds like an interesting project are there good places to start with if they want to contribute in whatever form right because there's code contribution i guess there's documentation contribution so any any thoughts and where to get started so one great way of contributing would be to look up the project get started with with any tutorial that looks appealing to you and then trying it out. And if there is anything that you notice that you don't like, then bring it up with us. So feedback, user feedback is on high demand in OpenTelemetry

Starting point is 00:45:00 and we would love to hear back from actual users trying it out. There is the CNCF slack that everyone can sign up for with a bunch of OTEL prefixed slack channels where you can reach out to the respective SDK maintainers for any questions or any issues. And of course, if something looks like a bug or anything, then make sure to raise an issue in the respective repository. That would be one great way of getting started with any kind of contribution.

Starting point is 00:45:37 Then of course, as always, documentation is something that can be worked on. And if you feel ready for the first code contribution, then you could look for the respective SDK implementations and you will find, usually you will find some issues labeled with help wanted or good first issue, although those are not always applied, but you can get started by just mentioning

Starting point is 00:46:09 that you would like to work on something and if no one is already working on it in parallel or if you think you can do it in a nicer way, then you can get started with that. It's amazing how fast time flies. I want to also be respective to everyone's time here, but i also want to make sure we didn't miss anything that we want our listeners to know so dan armin any any things we missed anything you want to make sure that people are still still know about OpenTelemetry? No, I think we did a pretty good job of covering

Starting point is 00:46:54 the history of the project, where we're at, and where we're moving forward to, as well as how people can get involved. I would stress from a getting involved standpoint, what Armen mentioned about users and providing feedback. I think a lot of people what armin mentioned about about users and providing feedback i think a lot of people when they think about how can i contribute to an open source project they always tend to think about how can i make code contributions or how can i how can i actually you know make changes to the project but really the most important community members are the users. So just using the project and providing feedback is immensely valuable.

Starting point is 00:47:32 And I think the value of that is often lost or at least not talked about. Where we can build whatever we think is cool, but if we never hear back from users that they like this, or they don't like this, or they really have this gap that we need to fill, then all we're doing is building what we think people want or would use. For me, the most valuable contributions are honestly just people using the project and then telling us where it can be improved. That lack of feedback is how you end up with Microsoft Bob.

Starting point is 00:48:14 How you end up with lots of things. Now Armin needs to look up Microsoft Bob. I saw his face. Yeah, it's not the paper clip that shows up in Birdish. It came out of that, I believe. I believe Clippy was if you go back to the Windows 98 theme packs,

Starting point is 00:48:35 that was what ended up coming out of Microsoft Bob. I will hear no blasphemy against Clippy. Oh, we all love Clippy. Because if we didn't have Clippy, who could we complain about? You know, the thing I'm most excited about, just wrapping up my last thought on this,

Starting point is 00:48:52 I'm most excited to see the third-party integrations, you know, where vendors are baking OpenTelemetry into their code, into their packages, so that everyone can use it i'm what i'm really curious to see is if there are any of the historically obtuse vendors i won't

Starting point is 00:49:11 name names but we all know some of them are out there will refuse to well they'll either refuse to do or put open telemetry into their code to help everybody out to monitor what's going inside or if someone's going inside. Or if someone's going to be like, oh, this is a great idea, but we're going to do our own proprietary version of baking instrumentation into our code just because we, you know, companies do that. It's going to be interesting to see how the political landscape of the larger code vendors or package vendors plays out. Obviously, when you're dealing with more open source-based projects and packages of code,

Starting point is 00:49:47 you're not going to have that. But there are these large companies that maintain a lot of their own code. And I'm interested to see how they play along with that. Which ones really embrace it and say, we have nothing to hide, and other ones who are going to be like, we're not going to let you see this. So you're talking not at this point about observability vendors but about application vendors yes yeah i completely agree um but i do think that as open telemetry gets more momentum uh it will be uh more and more difficult to say we're going to do our own thing. Because as more tracing backends support it and more platforms support it,

Starting point is 00:50:29 it hopefully will become an obvious value. And even a competitive disadvantage if you don't. Exactly. Because in the end, if the consumer demands it, the consumer of the software, because somebody needs to run it, and then it's a troubleshooter and said, hey, I can go with you or with the other one. And the other one gives me better ways of operating your software than I go with the other one, because that's more valuable for me, maybe.

Starting point is 00:50:54 Yeah. There's always that behemoth mentality, though, of, again, vendor lock-in idea that, you know, you're stuck using us and we'll sell you services to analyze the code instead of allowing you to do yours. Anyway, it'll be interesting to see how it plays out. I hope it doesn't go that route, but you know, there's one company I have in mind in particular, which I won't say that I can see doing that kind of stuff, but you know,

Starting point is 00:51:14 hopefully not, hopefully I'll be proven wrong. Awesome. Hey, Dan, Armin, thank you so much for being on the show. And thank you for having us. And thanks for doing your work on OpenTelemetry because you make all of our lives easier in the end. I'm glad to hear it. Thank you. And Brian, I hope I can also speak for yourself,

Starting point is 00:51:36 but we want to make sure we invite them back in a couple of months to see what's happening. Andy, speaking of the thanks, Andy and I come from a world where we always wish people took performance seriously. And it's amazing to see that there's this huge project now where people are taking performance seriously and things are going to be starting to be baked in eventually and all.

Starting point is 00:51:56 So from that perspective, I thank you as well because it's always been a passion of ours to see stuff working properly. So thanks to you guys and everyone else on the community and everyone using it and providing feedback. Yeah, we look forward to having you back. And anybody has any questions or comments? Andy, was there something else you want to say?

Starting point is 00:52:17 I can see you looking. Okay, he had a look of anticipation. The problem with video. Thank you, everyone, for listening. If you have any questions, comments, feedback, pure underscore DT on Twitter or email us at pureperformanceatdianatrace.com. We'll have a bunch of links to a lot of the things that Andy mentioned in the show in the show notes. So thanks again for listening and making this all possible, everyone. And thank you to Armin and Daniel for being on our show today. Thanks, everyone.

Starting point is 00:52:42 Bye- Bye. Bye.

PurePerformance - OpenTelemetry from a Contributors perspective with Daniel Dyla and Armin Ruech

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.