PurePerformance - Unlocking the Power of OpenTelemetry: Insights from an OTel Expert at NWM
Episode Date: May 8, 202336 million generated OpenTelemetry spans per hour for GraphQL based queries – that’s just one of the stats we discussed with Justin Scherer, Sr Developer and Consultant, who is leading OTel adopti...on and Shift-Left observability efforts at NWM. For Justin, OpenTelemetry helps commoditize data gathering in modern cloud native environments so that the backend observability platform of choice can focus on answering higher level business impacting questions.If you are about to roll out OpenTelemetry in your organization then take the advice from Justin such as: Bringing Business Leaders early into the discussion! Engage with the OpenTelemetry community! Understand what your Observability Platform already gives you and focus on the gaps! To learn more about OpenTelemetry check out some of the links we discussed during the podcast:OpenTelemetry Website: https://opentelemetry.io/IsItObservable: https://isitobservable.io/open-telemetryPodcast: https://www.spreaker.com/user/pureperformance/adopting-open-observability-across-your-LinkedIn Profile: https://www.linkedin.com/in/justin-scherer-198126160/
Transcript
Discussion (0)
It's time for Pure Performance!
Get your stopwatches ready, it's time of Pure Performance.
As you probably noticed, this is not the voice of Brian Wilson.
It's the voice of Andy Grabner.
Brian is not here today.
He is, well, stuck.
I don't think stuck is the right word, but he is actually in Florida enjoying Dynatrace sales kickoff,
getting to learn everything that's new in our world so that he can sell better Dynatrace sales kickoff, getting to learn everything that's
new in our world so that he can sell better Dynatrace.
But we have a great episode today because we have a great guest today that covers an
important topic for all of you.
And the topic today is actually shifting laptop's ability.
We will talk a lot about open telemetry.
I just came back from KubeCon Amsterdam, which was fantastic.
Learned a lot about what the latest trends are.
Really seeing a big, big boost of OpenTelemetry,
even though it seems like booming for quite a while now.
But I invited Justin Scher to the podcast today.
Justin, thank you so much for being on the podcast. I know
you've been adopting OpenTelemetry in your organization and you're a big advocate of
shifting left observability data. Now, I don't want to just talk on my end. So first of all,
Justin, could you quickly introduce yourself, who you are and what you do in your current role?
And then I want to dive into the topic because I want to learn from you
some of the adoption reasons, adoption challenges,
and best practices we can learn from you.
Yeah. Hi. Yeah, I'm Justin.
I work for Northwestern Mutual.
For those that don't know,
it's a financial,
basically a tool, a financial suite with our financial advisors to help do planning for our customers and figuring out the right financial instruments for them to invest in, which means that there is a lot of data and a lot of various functionality in our system
to help our financial advisors,
best way of saying it, advise our customers.
I'm a developer on what's known as our illustration system.
So when you get that nice fancy printout
of all of the various numbers
and various things that make up a policy.
That's what we work on.
And specifically, I'm a dev on our backend system, but I also kind of help on the front end and I'm on performance, so a lot of various aspects of our system to try and help not only deliver features for the business,
but also make sure that we're up and running and trying to keep our 99.69% uptime.
If you explain this, it almost sounds like you have a lot of heads to cover or heads to wear.
If you are also responsible or helping at least with keeping the systems up and running,
isn't that also like an SRE function that you then have or do you support SREs? How does it work?
Yeah, so I'm not like officially on our SRE team. We have kind of our enterprise SRE, but in terms of our illustration system, I'm very much
I have this kind of off on the side group, which is I'm always looking at our Kubernetes systems,
looking at how we hook into our cloud provider and making sure that that system is not only
right size, but also that if there is downtime in some capacity, we're figuring
out right away.
So yeah, kind of really what would be considered SRE.
Hey, and Justin, so one of the reasons why I wanted to get on the podcast, because you
have been walking down the path that many in the industry are currently walking down,
meaning Kubernetes seems to be becoming the standard,
obviously, core platform for the platforms that we are building.
It's a complex system.
It gives us a lot of opportunity,
but it's a complex system and
complex systems even need more observability.
Now, OpenTelemetry is the most successful project,
I think, these days in the CNCF ecosystem.
So for those people that don't know OpenTelemetry, it's an open standard
that actually defines how observability platforms can consume data
or can observe metrics, logs, and traces.
And so, Justin, what I would like to hear from you,
why OpenTelemetry for you on Kubernetes?
What problems does it solve for you?
Because, you know, obviously, I represent one of the vendors and there's many other vendors out there.
We've been doing observability for many, many years.
Yet, there's people like you and people that I spoke to last week at KubeCon, and they all want to dig into OpenTelemetry. Why is this? What problem
does OpenTelemetry solve for you? So OpenTelemetry,
it solves what I would say kind of this kind of
for us, it's really a two or three prong approach. And
one, as when you're an
enterprise or even you're just a developer, you don't necessarily want to be tied to a vendor, right?
One of the things that you always want to try and maintain is this kind of like this agnostic approach, if you can, to at least pulling data.
And OpenTelemetry gives that by being an open standard and almost every vendor out there in terms of observability,
they support open telemetry data.
And so if we can remain agnostic and if, let's say, an observability platform is just not meeting the needs of that enterprise or that company,
it makes it just a little bit easier to try and shift. And I think that's one thing I kind of mentioned to people
I've talked to about OpenTelemetry is that what used to be, let's say, even just a decade ago,
was the highlight for observability platforms was this idea of, oh, we can now ingest traces,
or we can ingest metrics or stuff like that. now that should be seen as the mundane and that should be seen as this base level.
And that's what OpenTelemetry kind of gives you, this base level to where now we're trying to elevate observability platforms to say,
hey, you don't need to worry about X problem anymore.
I now need you to worry about giving me context to this data.
Or I want you to somehow figure out how does this trace or this
piece of data relate to this piece of data, or how can I query on it? Things like that.
And I think that's the first prong of open telemetry is really starting to,
let's get rid of people worrying about the mundane and let's get us now starting to work
on advanced problems. I will say the second prong, the major prong for us at least,
was no matter what observability platform
is out there, they're not going to be able to keep up with technologies.
I mean, just to give the example, we do use
GraphQL. And for those that don't know, GraphQL is a different way
of querying data. Some people have called it the new REST.
Really, what it does is allow consumers to query only the pieces of data they want.
But because of this kind of new approach, you have this single URL, a single HTTP verb,
with every single piece of data that you can imagine stuck in this post.
And because of that, we've been so used to REST for so long that most observability platforms don't understand that piece of it.
Even when you get into errors, errors is handled.
It's a 200, an HTTP status code of 200, and the error is inside the body.
And so open telemetry allows this kind of neutral approach to say, hey, we get it.
There's brand new technology coming out every single week.
Let us handle these new tech, be it TRPC, be it graphql be it whatever springs up and we can fill
in those blanks and that's really what we've been able to do is open telemetry has allowed us to
fill in all the blanks that an observability platform i just i can't imagine one being able
to support absolutely every piece of technology that comes out.
Yeah, and I think you're obviously here representing one of the vendors, I completely agree with you. While we have built agents and still building agents to cover a big technology
breadth, we are really happy about OpenTelemetry because it also makes our life easier because it
actually pushes a lot of the reverse engineering we used to do over the years
to the vendors of frameworks, of runtimes,
or the developers of custom code.
They know best what it is that we want.
Especially the GraphQL, this is really fascinating.
I didn't know that GraphQL works like this,
that basically everything that comes back
is an HTTP 200 call,
as long as the call is obviously successful.
But then in the body, you have the information about
did this query actually return any data?
Was there, I guess, I don't know, a mistake in the query language
that you used or in the query parameters?
And with that, obviously, it makes sense that you are then
using OpenTelemetry to get additional context
out of that individual transaction.
Exactly.
Hey, Justin, one more word on GraphQL context out of that individual transaction. Exactly. Yeah.
Hey, Justin, one more word on GraphQL, because I'm looking here at a presentation that you
did at Dynatrace perform.
And if I'm just quoting you, you said one trace can be over 1000 spans big.
So that's a lot of depth of information. Also, you are, it seems one of your GraphQL entry points
gets upwards of 60,000 requests per hour.
So you also have quite some load on the system.
Traces move between 10 different microservices.
So this all really shows us that we are really truly living
in a distributed world where distributed tracing
that OpenTelemetry provides is important.
Do you have any other things, especially for folks listening in and using GraphQL that
were kind of surprising for you or it was important for you to kind of pass on?
Yeah, so one of the big things that you can do with GraphQL is this idea of creating, there's kind of
these two competing viewpoints of how to potentially create what's called a supergraph, which is
essentially just delegating to separate GraphQL services.
And for your consumer to just see this one entry point, most people can think of this
like an API gateway.
There's kind of two competing views, and one's called stitching, and another's called, it's a gateway,
Apollo gateway. And really,
this is another area that can be a major problem, because what
to a consumer looks like an X query could
actually be five queries under the hood, and it's all separating out.
And so when you get potentially issues that crop up with the consumer and they give feedback and say, hey, X query isn't necessarily working.
And if you don't own that piece of the code, you're going to be like, I don't get what you're saying to me right now. So with something like open telemetry
or giving you this observability piece,
you can now see, oh, I can look at that query,
see it's coming to my service,
and I'm potentially the piece that could be broken,
and I can now see that.
So when you're getting support calls or service calls,
it's just a lot easier when you have that piece added in, because otherwise you're going to get the kind of deer in the headlights look of, cool, that's a problem.
You know what, this really reminds me, and I'm pretty sure if Brian would be on the call now, he would jump in and say, hey, the problem that it's just explaining, like one request is coming in on the front end and then it's splitting up into multiple requests
to the backend.
It reminds me a lot of the N plus one pivot problem
we've been talking about for so many years
and as a pattern where typically
from a backend component to database,
making a lot of round trips
to the backend database to fetch more data.
It seems the same is happening here with GraphQL.
And it can turn in, M plus one is actually, it's a major issue in GraphQL. And what can be,
it can be exponentially worse across the board because M plus one is always usually a very,
it's a very microscopic because you kind of look at the microservice or the gateway and see that
it's doing M plus one. But the problem is if in your stack, you're using a technology that can be an
M plus one problem, it just keeps compounding it down. And exactly seeing those issues, uh,
without your observability, you probably won't notice that there's an M plus one problem until
it's too late. Yeah. And I think this really reminds me,
you know,
back in the days I started with Dimensions 15 years ago.
And I remember in the very early days of my career in observability,
we looked at the Hibernate framework,
but Hibernate,
I'm not sure if that rings a bell for you or for some other folks that are
listening,
right?
But it's a very popular framework for data access,
basically an OR mapper.
And it was eye-opening
for many developers to see
a distributed trace showing
that accessing an
object was all of a sudden executing
hundreds of thousands of
database statements because every single
referenced object, like a list or so,
was patched individually.
It seems now with GraphQL,
it's a different technology, but it's the same
problem because you're
making it very easy for the consumer
to do something,
and then you're using GraphQL, and then you don't know what
GraphQL is really doing. In the end, it gets you your data,
but I think you really need to understand
and look at traces to
figure out,
is the transaction that I'm triggering efficient or not efficient?
And I think this comes back to your point of shifting left. I think enabling developers to see what is actually happening when they're executing these queries is very important.
Yeah. So funny story.
You mentioned Hibernate. At a previous company, I actually had to go through some of our Hibernate pieces because
they were doing exactly what you said.
And to give this kind of like, here was the different approach of we didn't really have
an observability platform there.
So that was actually digging into code and looking, oh, look, we just hit 150 query get requests or selects because Hibernate decided this was the best approach to do it.
And that's digging into Hibernate code, which is for anyone that has ever dug into Hibernate code is not very pretty to look at. But shifting then, like you said, shifting left, when you're now here, when I'm at my current
place and you have GraphQL and I have the observability piece and I don't need to dig
into GraphQL anymore. I can just look at the traces or the metrics or whatever, and I can see,
oh, there is an M plus one problem because the traces show I'm hitting either a hundred distributed
traces or I hit this microservice 15 times and the trace just shows it bouncing back and forth.
So I think that's a major difference between the two approaches of, I don't necessarily want to
dig into a library's code, aka the Hibernate, me looking at Hibernate, versus I just want to look at traces
because I may be able to figure this out right away
and then look at just my code to see
could I have done something differently.
And Justin, also from my understanding
and for the listeners, so GraphQL,
I assume there is like a lot of libraries
that developers use, standard libraries,
like client libraries.
Are these already instrumented with OpenTelemetry, most of them?
Or do you have to go in as a developer
when you use GraphQL to then add your traces?
So I will say it's highly dependent on the language.
But in terms of JavaScript, the telemetry or the library that wraps
the base GraphQL library has already been done.
So be it you use, let's say, Apollo Client,
which is a library, a wrapper around the main base library,
or you use what's called GraphQL Yoga or something like that,
it's already instrumented because all of them
still use this base implementation under the hood. So you have something like that, it's already instrumented because all of them still use this base implementation under the
hood. So you have something like that, it's amazing because when the JavaScript land, you can
basically use any client you want and adding this tiny wrapper instantly starts instrumenting for
you. When you get into other things, something like C Sharp, it's really highly dependent on the server.
So there's kind of two big implementations in that realm where you have GraphQL.NET and HotChocolate,
and both of them have different implementations.
And then even in Java, I know there's a couple implementations that are out there.
So it is highly language dependent, but I will say at least the languages
that I've looked into,
Rush, Java, C Sharp, JavaScript, there are implementations.
And OpenTelemetry has basically taken over those GraphQL fields.
Cool.
Let me take a step back into something you said in the very beginning.
You said what OpenTelemetry has done, done open telemetry has kind of um commoditized i would say i helped to commoditize how we get the data
right so there should not be i think it allows us to elevate the discussion of saying and i need
metrics i need this metric and this trace and this piece of information to well let's assume we have
this data because it comes in and that's just what you assume,
to now changing the discussion towards,
hey, I want to actually give answers
to particular questions like,
you know, why am I hitting my database so heavily?
Or why am I hitting my backend services so heavily?
Why do they cost so much after the recent update?
Are these, you know, the GraphQL example
we just talked about, again, one of these examples, what other examples can you
give me on kind of what are higher level questions that we can now
ask our observability platform? What are the typical
things you see in your organization, whether it's on the dev side, on the SRE side,
the DevOps side, the business side, what other questions do you see in your organization, whether it's on the dev side, on the SRE side, the DevOps side, the business side? What other questions do you see?
So I'm seeing a lot more.
Used to, I would say, a lot of our questions just went around,
oh, this thing spiked in CPU usage. Why did it
do that? But we can start asking higher level
questions. A lot of stuff will now be related to,
well, the client made this request, which then spun off, let's say, 10 separate requests in the
backend. I can now see, ask, okay, if client does X, Y, and Z, how does that affect my microservices, which now potentially are getting higher usage,
are now using up more memory?
I can now start seeing the links between all of those.
And I can start asking the questions, okay, if client does this, what is the cost associated
with that?
And that's, I think, where we're starting to elevate questions. The questions
are no longer singularly focused on, oh, microservice started using more CPU, so now I
need to up CPU on Kubernetes. I can now ask, well, we added X functionality for the client.
How did that actually affect our entire work stream? And I think that, from especially a business standpoint,
just helps us out so much.
It's no longer a black box anymore
into this kind of microservice and dev world.
Business can actually start asking those questions
and dev can start answering them with Eaves.
And basically, this already kind of translates, I mean,
great stuff. So instead of
asking, why do we have a CPU
spec? The question is,
do we actually make a profit with the new
features? Or are the features
that we just built actually
cost efficient? Or are
they hindering us? So basically
we're changing the conversation to more like
a business-driven discussion. It's like, hey, is everything
in place so that whatever we provide as an organization
runs within our business constraints? We can actually
afford the hardware, we can afford our, in this case, cloud costs, and we
actually run efficiently.
Then carbon footprint comes also in.
I think this was a big topic also at the recent conferences, also at PERFORM.
Are we looking at our carbon footprint?
So that's phase two.
It's really open, to rephrase, open telemetry.
And I think you used the word mundane. I would like to use the word it provides, it makes, what did I say earlier? The standard, what did I use the word? I'm blanking on this now. What did I say earlier? I said it commoditizes, right? kind of observability, which then now allows us to really ask higher level questions that
are especially interesting for the ones that put the business to understand how the system
is running, but then gives enough context to the dev teams to understand where systems
are not running smoothly.
And that's exactly.
Yeah.
So shift laptop solubility,
that is a topic that in our preparation for this call,
it was something that you were very,
you know, you were very happy to talk about.
And I think we already covered a little bit of this,
like, you know, giving developers insight into a trace
so that they can see what's actually happening.
What else is the benefit for engineers,
development teams to get access to this data?
Is there any, besides just knowing
what the system is doing during development,
what else is the benefit for development?
One of the best things that I think has really come of it
is I think every developer that's
really worked has gotten those midnight
calls where production
is acting up.
No one likes to be on call, but
companies have to do it because
we didn't necessarily
test or we didn't
necessarily run
performance tests on this or we
did run performance tests,
but it wasn't at the scale that our clients are currently calling,
things like that.
And a big part of shift left
is really trying to minimize impact on our devs
and on people as a whole.
I mean, first off, our customers don't want our system hitching in the first place,
but they also don't want it to be down. They want the data and they want to use the platform
on their time. So we're already doing that. We're giving business value back to our consumers.
But I think, at least from a dev's point of view, and even from this kind of performance engineer, however you want to call it,
we're not getting called anymore at midnight
if we're moving all of that type of testing performance
and outlook to shifting it left
and getting it earlier in our dev cycle.
We're not getting the calls
because we tested it all the way
that our performance matches
what our consumer activity is like. We've tested and we showcased through our testing when we moved it left because we showcased that in a blue-green deploy worked in Inting QA so we can now shift it to production. this kind of shift left mentality, we're actually helping the developers
not have these,
not in the typical nine to five calls.
They can now feel comfortable,
let's say taking a vacation
or stepping away from the code.
I don't need to be constantly in work mode
because we did this shift left mentality.
And I think that's really the,
we're going to always as a business
want to say that, well, what was the business value? And we can tie it to dollars or consumers
or all of that. But I think from a dev point of view, it's really, I get to now have a life
outside of work. And I think that's something that devs should always be thinking of is that
I'm able to now enjoy my weekends or I'm able to
enjoy a nice leisurely Friday and not be worried necessarily that I'm getting called at weird hours.
Dustin, I need to add this to the description because I think it's just the first time where
I heard shift left explained in that way because typically when we often say that often when I when we talk
about shift left people say oh it means you're putting more stress in the developer now they
need to do more but actually you are turning this around and say shift left is actually in the end
minimizing the impact on our devs they can focus on their work nine to five whatever they work
when uh their work day because we give them all the insights
that they need so that up front
they can be sure
that the system is not going to crash
in the middle of the night.
Because they see that
GraphQL is just like the new
Hibernate and we need to make sure
that we don't have these M plus 1 query problems
because they will kill us in production.
Exactly.
I think, I know like we went through a transformative period
with shift left and I know I felt the pains
just as probably every other dev felt.
I mean, not every dev wants to focus on certain aspects
and I'll say like, I know I'll use kind of negative verbs here,
but testing can be boring.
But from that standpoint, while maybe boring, it's saving you from the potential to not
be able to enjoy time when you're not at work.
And I think that's the piece that really from the dev standpoint is, yes, maybe we
are adding work or we're shifting at least skill set.
I think that's the best way of putting it is you're shifting a skill set.
You're not just focused on dev, you're thinking of it as a whole.
But when you start shifting that skill set, you're also, like we kind of point out, you're minimizing impact on the dev.
And I think that's really the big piece of this.
I just need to take a couple of notes.
It's great.
I will quote you on some of these
in my future presentations, I think.
If you got another question.
So you and your organization, OpenTelemetry,
coming back to that topic quickly.
Last week, I talked with a lot of folks at KubeCon in Amsterdam.
And a lot of them are saying, yeah, of course, OpenTelemetry is the observability layer of choice,
clearly on Kubernetes, but it's the number one thing.
But still other people said, well, we don't really know what this really means and how
we actually roll it out in our organization.
So there was a lot of discussion around enablement of development teams.
So the question was actually, if we go with OpenTelemetry,
first of all, what do we need to do and what is already there?
What is already instrumented with OpenTelemetry?
How much then do we additionally need to instrument on our end?
And also the question came up, well, can we do anything wrong?
Can we over-instrument?
What are best practices?
So kind of throwing the question to you,
when you were kind of starting on your OpenTelemetry journey
and you rolled it out and enabled your development teams,
any lessons learned, any things that development teams any any things any lessons
learned any things that went well any things you did that you would do again any things that you
that didn't work well so i will say uh what one of the things that really stood out to me was
um at least getting the instrumentation in our JavaScript systems, Node.js,
it was literally, I added a simple file.
I think I called it trace.ts or something like that.
I added some very basic things that were on the OpenTelemetry page,
the documentation for it, and I got 80% of the way there. I know it's going to sound like, oh, it's because he's on this observability platform or because he loves open telemetry.
But really, it was crazy how easy it was to get set up and getting to that 80% mark.
I think that was the craziest part for me is I'm used to using a piece of technology
and getting maybe 50% of the way there.
And then you got to start adding your own custom code in
and really tailoring it to your needs.
And OpenTelemetry kind of just gave a lot of stuff to me.
I will say kind of some lessons learned about it though
is I think not from the tech point of view,
but from the business point of view is really bringing in some of the business leaders earlier
in the process, because it was very much, I was kind of experimenting with open telemetry and
just bringing it in to see if we can see anything, seeing the immense value you got, but then going
to business leaders and showing it,
and then them being like, well, why did we add this in? Isn't this what the X platform is meant
for? I think it's kind of bringing business leaders in earlier on that process. But number two,
now from the tech standpoint, was I think engaging with the OpenTelemetry community earlier. There were aspects of
kind of pitfalls that you can run into. So one of them is, and this is very specific to JavaScript,
but if you're on the ECMAScript module system in Node.js already, you won't be able to really get open telemetry,
at least the auto-instrumentation in right now.
And that's due to some packaging issues
and the way the module system works.
So it was interesting to have one of our microservices
already shifted to that.
And so we had to do some custom work to get it actually put in. And I think it was also
kind of lesson learned is understanding what your observability platform may already give you.
So one thing that we ended up finding out is our agent that we have on our system was
automatically picking up OpenTelemetry data for us. But I know other platforms may not have that. And so understanding where
these kind of different pieces of open telemetry come into play. There's things like exporters,
there's things like converters and all of that. And really understanding what your platform may give you and what you may need
um i think was a major piece for us um that was kind of that was kind of a major lesson learned
yeah i think that's that was actually a question that i wanted to ask you because um open telemetry
is just one piece of the puzzle right uh instrumenting the code and basically having
the ability to send this
data to the observability platform. Or if you go all in with OpenTelemetry, you have
your app instrumented OpenTelemetry, then you have an OpenTelemetry collector that needs
to collect the data and that then needs to send it to somewhere where it's actually stored,
persisted, analyzed. And in your case, your platform has already done a lot of the work for you which is
which seems great right um yeah exactly yeah is there because i had a lot of discussions again
coming back to kubecon a lot of folks were saying well we are going all the way in open source
you know with no no commercial vendor no nothing. And I think some folks don't necessarily know maybe
what this really means.
I think OpenTelemetry is great,
but there's more to OpenTelemetry than just instrumenting your app
because this is just giving you the basic kind of opportunity
to actually patch data,
but you still need to collect this data, send it somewhere
in a secure way, analyze
it, make it available again, and this
is where then the real
I think this is the real value then on top.
Exactly.
How can we make use of this data?
Yeah,
that's a very good point.
I think there is some, especially for
someone that maybe is not currently in the observability space, they haven't entered it at all. So people that they're just using, let's say Prometheus right now, Prometheus and Jaeger, two of can just use open telemetry i can use my exporter and then collector and then
ship it off to jaeger or prometheus and all figuring that out and i would say that's great
i mean if that gets you initially in the space and gets you initially seeing value of what something
is providing that's excellent because that's then going to allow your business leaders to look at it and be like,
cool, we have this data right now.
But that's the day one operation, right?
You got your data and you're shipping it somewhere.
But now your day two is going to be your business leader coming in and saying,
well, now I want to understand these five KPIs.
How does this data give me those five KPIs?
And maybe your first day too is,
okay, I'm going to write this crazy query for Prometheus
that's going to start tying all this data together.
Or I'm going to write a Jaeger system to understand
how does all these traces now start working with each other?
And how does it hook into my ElastiCache logs that are sitting
over here? And you're going to start noticing that what was initially easy and getting all
that shipped into just Jaeger, Prometheus, ElastiCache is now starting to turn into this
monumental task again. And you're going to be like, well, now I need to hire 10 DevOps people
to really start looking at this data or something like that.
And that's the piece that I think where if you start getting those 10 KPIs that really
tying all that data together is confusing, that's the piece that you're going to start
seeing where commercial vendors or an observability platform is going to start giving you that. Do you want to spend the, let's say, half a million dollars over the year
and training of developers to just go 100% open source?
Some companies may see the value proposition in that.
But for a lot of enterprises, they're going to see it as,
well, that doesn't make any sense. We should be
letting people that are experts in that field do those day two, day three operations.
And I think that's where open telemetry, kind of shifting back to what we said at the beginning,
open telemetry gives us day one and potentially day two. But really, all of those
advanced things that you want, I don't think OpenTelemetry should really go into those because
that's really starting to get into areas where you need to tailor data. And OpenTelemetry is
not going to be able to create standards around tailoring data. That's really up to vendors or companies to figure out
what they need.
Open telemetry to me is,
as you said,
commoditizing data.
That's what open telemetry
should be doing,
commoditizing metrics,
traces, logs, profilers,
things like that.
And then other solutions
should provide the analytics
or the wrapping of all of that and making it nice and neat packages.
I like the way you explained it.
And I think there's another analogy or kind of a similar story with Open Feature.
Open Feature is another open source project in the CCF space.
And it's the same thing. So your open feature
is standardizing the way developers can implement feature flags in their code.
So it's again independent of your vendor, so you can really get started easily without a vendor
login. And then there's also flag D as an open source backend implementation to get you started.
And I think that's also what we heard last week at KubeCon because we are active in open feature
because we kicked it off initially last year also with eBay
and some of the other feature-plaguing vendors.
And people that came to us last week to the booth,
they said, hey, you know what?
It's really cool.
Open feature is a standard.
FlagD is an open source kind of like your day one.
We can test it out.
We can get started.
But then eventually, we obviously need to go to a commercial version of the backend system
because we need the scale, the analytics, the enterprise features,
like who can change the feature blacks, the analytics on top.
This is stuff where we go beyond day one, where we then go day two
and then just operationalize everything.
And,
and it's just the same with what I see,
what you just tell me with open telemetry.
That's a great way to get started.
And definitely it doesn't lock you into anything.
You can walk a long way,
but eventually you should focus again on your core,
on your core business value
and that is not building and maintaining
a complex backend data storage analytics software solution
because this is where commercial vendors come in.
Yeah, exactly.
To me, if it's not your business,
I mean, we've seen companies
where they've built other solutions for things
and then that's how it's spun into.
The best example I can right now think of is like Slack.
Slack, they built that messaging tool
as their internal tool
and they were building a game, if I'm correct,
and then Slack is what took off.
So those stories exist out there. But in most
cases, you're trying to focus on your specific product niche or whatever you're going into.
And you're going to start seeing the exponential increase of trying to get value out of your
analytic solution or your open feature system or whatever it is. And you're going to start seeing,
okay, my homegrown solution just does not compete here.
And that time value proposition
just completely falls off eventually.
And I think really what some of the best devs
or the best businesses are the ones
that are not going to be reactionary to this fall off.
They're going to start seeing this kind of slow decrease
of value of them building their homegrown and start noticing that, okay, now we need
to switch.
We need to use X system now.
And I think that's really where you see it is.
And I guess this kind of goes back to that shift left mentality, even in the business
realm, where you're not being reactionary anymore.
You're being
preventative you're you're seeing it up front and making your decisions way sooner than
when it's already fallen off and you've completely missed your kpi or something like that
justin from uh from this conversation is there anything missing? Or is there, like, if you think about it,
we have people listening to this.
They might be already familiar with OpenTelemetry.
They might be new to OpenTelemetry.
I think we covered a lot about what does OpenTelemetry result.
I think we understand this, right?
It's really the commoditization of how we collect data.
That's great.
I think we also talked about shifting left.
But it's really about minimizing the impact on developers. That's great. I think we also talked about shifting left, but it's really
about minimizing the impact on developers. Really great stuff. And also like the shifting
left is actually shifting the skill set. So in the end, minimize the impact on dev. You
also gave great overview of your rollout experience with OpenTelemetry, 80% just with the default
instrumentation you get with some of these OpenTelemetry frameworks
and libraries that are out there, bringing business leaders early in the process, engaging
with OpenTelemetry community earlier, and also understanding what your observability
platform already provides.
Anything else that if somebody that listens to this wants to now get started in rolling
out observability in the organization, shifting it left that we need to discuss?
I think it's understanding, especially an understanding.
I don't want everyone coming away from this and seeing OpenTelemetry as a silver bullet.
It is still evolving.
I mean, logs just got feature frozen just a few months ago.
So just because it was feature frozen, that now means all the implementations need to
go in.
And so understanding that, and I kind of gave this at Perform also, this call to action of
if you are interested in this, and if it even gives you, let's say that 50% for your day one,
talk and bring up suggestions in the OpenTelemetry group because I can be completely honest,
they are open and they want help
and they want to understand what are your pitfalls.
It's evolving and it's evolving at such a rapid pace
that you can definitely tell
that it's getting a little bit uncomfortable for them
because they're getting so many more users
and they're happy, but they're also like,
well, we still need X, Y, and Z feature added in.
I bring up logs because I was so happy getting traces and metrics.
And then it was like, well, where's the log feature for them?
And it hasn't been built yet.
And so it's understanding that if you even get some value, try and talk with your organization.
Or as a single loan developer, try and work with OpenTelemetry in some capacity because the more we give back in that way on something
that you maybe will take for granted or that business takes for granted or whoever, you
may not see the value initially, it's going to provide value eventually.
And I think those are the pieces that we need to see more,
I would say, devs even getting in the space. Because the more developers, the ones,
the boots on the ground people that are working directly on code and working and looking at
traces or whatever it is, the more that our commoditizing of data is just going to keep increasing. And that's really what I want to see.
It's seeing more of these devs getting involved to help commoditize this data even more.
Yeah. So shout out to everyone out there listening.
OpenTelemetry, one of the CNCF projects that is definitely not only worth looking into
because it benefits you, but I think can also contribute back.
It's ever evolving, as you said.
There's still a lot of work to be done.
But yeah, it's amazing how far the project already came.
And what I really like being in the observability space,
looking at it and seeing how it actually brings together, you know,
normally companies that are normally rivals on the market right like you know we if you look at all we as dynatrace and also data dog in the
relic and honeycomb and we're all contributing to this and um because in the end it benefits
us obviously right because we can uh we don't we're no longer eternally depending on our agents to build agent technologies.
And we can also contribute back to actually get the data that we need in order to provide higher level value with our observability platforms.
Yeah, I mean, and it makes sense.
Yeah, I guess that's the piece that I know I've had some discussion with devs and they'd be like,
well, what's the reason why X observability platform is buying into this?
And I kind of explained it in this way of, well, okay, you're a dev and you work for a company.
You don't want to keep writing the same form so many times.
I don't think our observability platform developers and business leaders want to keep rewriting traces over and over again.
They want to work on the cool stuff the same way you want to work on the cool stuff.
Yeah.
Hey, last question for you.
Are there any resources that you have used when you got started with OpenTelemetry?
Any particular people to follow?
Any, I don't know, anything where you say, hey, this was really good that I have this resource available? So I will say OpenTelemetry.io,
their website, their docs are great. They're definitely still holes,
but they probably provided some of the best resources. Other than that, I would say
get on Slack, get on the CNCF Slack channel and start going into the OpenTelemetry groups.
So there's OpenTelemetry, just the base one, but then there's Otel FOSS, Otel JS, all the different specifics, and just start asking questions in there.
The founder of Open Telemetry, absolutely excellent.
I got a chance when I was doing some, they were doing user research,
got a chance to talk with them.
But any of them, go in those channels,
because you're going to also run
into developers from different pieces. So like Othell Fahs, you have AWS developers
in there, you have Azure developers in there. And so I would say really diving into those
two will really just elevate your experience. And then I can add two additional things.
We just published a podcast with the author of Practical Open Telemetry.
The podcast episode is called Adopting Open Observability Across Your Organization.
And our guest was Daniel Gomez Blanco.
So he just published a book on open telemetry.
And the other thing I want to highlight,
Henry Brexit, who is also working with me,
he has the Is It Observable channel.
So isitobservable.io,
and he's been covering open telemetry quite a bit.
And he also put out YouTube tutorials
and GitLab tutorials to just get started with some of this.
Cool.
All right.
Justin, thank you.
Thank you so much.
Actually, I believe you will probably meet Henrik in a couple of weeks
because there's going to be Glucon.
I'm not sure if you are at Glucon.
Oh, I'm not.
I sweat.
I wish.
Okay.
I just got back from a vacation, so I can't leave off of work again.
Okay.
Because I think you're in Colorado, correct?
Correct.
Correct.
Yeah, because GlueCon is in Denver in a couple of weeks.
Oh.
And Henrik is going to be there.
We have some other folks from Dynatrace going to be there.
So that's going to be a good opportunity. So whoever is listening, if you're in the Denver area or if you go to GlueCon,
you may want to ask Henrik about some OpenTelemetry advice because he knows his stuff.
Yeah.
All right.
With this, I say sorry, Brian Wilson, that you couldn't be my co-host today
hopefully I did a good job
good enough job to do this interview
myself but
thanks anyway because Brian
is the person that makes sure that
all of this gets post-processed and
packaged up and then shipped to the
internet so that people can actually listen to it
so thank you so much and thank
you Justin
thank you so much. And thank you, Justin.
Thank you.
Bye-bye.
Bye.