The Good Tech Companies - Microservices Observability: Leveraging OpenTelemetry in Real-World Systems by Gajinder Sharma
Episode Date: July 4, 2025This story was originally published on HackerNoon at: https://hackernoon.com/microservices-observability-leveraging-opentelemetry-in-real-world-systems-by-gajinder-sharma. ... Learn how to instrument Node.js microservices using OpenTelemetry for end-to-end tracing, debugging, and performance monitoring across services. Check more stories related to tech-stories at: https://hackernoon.com/c/tech-stories. You can also check exclusive content about #microservice-observability, #opentelemetry, #r-systems-blogbook, #microservices-tracing, #jaeger-opentelemetry-setup, #distributed-systems-debugging, #node.js-observability, #good-company, and more. This story was written by: @rsystems. Learn more about this writer by checking @rsystems's about page, and for more stories, please visit hackernoon.com. Debugging distributed systems is painful without proper observability. This guide shows how we implemented OpenTelemetry in our Node.js microservices stack—from setup to best practices—boosting traceability, performance monitoring, and cross-service debugging with tools like Jaeger and Grafana.
Transcript
Discussion (0)
This audio is presented by Hacker Noon, where anyone can learn anything about any technology.
Microservices Observability Leveraging open telemetry in real
world systems by Gajinder Sharma, by R Systems As a backend developer working with microservices
for the past few years, one truth has become painfully obvious, debugging production issues
across distributed systems can feel like detective work in the dark. You've got services calling services, sometimes dozens deep. A user clicks a button on the
UI, and 15 microservices spin into action. If something breaks, or worse, just slows
down, figuring out where and why can chew up hours.
This is exactly why observability matters. And if you're building or maintaining microservices
in 2024,
OpenTelemetry is the tool you want in your corner.
What even is observability?
Really, observability is more than just logs.
It's about understanding why your system is behaving a certain way,
not just what it's doing.
At the core, we're talking about three pillars.
Logs, raw events, helpful for debugging.
Metrics, numbers you can
track over time, e.g. request count, CPU. Traces, end-to-end request flows across
services, aka your distributed call stack. Traditional monitoring tools mostly
focus on metrics and logs, but tracing is real game changer for microservices.
Why we picked open telemetryry we experimented with several observability stacks, Datadog,
New Relic, Prometheus, Jaeger, Zipkin, but they all had one problem.
Either they we revender locked or lacked consistency across languages.
OpenTelemetry, Othel, checked all our boxes, Open source, under CNCF. Works across languages, we use Node, JS, Go, and Python,
vendor neutral, export to Grafana, Jaeger, New Relic, etc.
Supported by everyone in the industry, literally, AWS, GCP, Microsoft, etc.
How we use open telemetry in Node.
JS microservices let me walk you through how we actually
instrumented a real service. Let's say we've got a simple user service built in Node. JS using
Express. It exposes an endpoint backquote, users backquote that fetches user data. Below are the
steps. Step 1. Install dependencies were going to export traces via OTLP to a local Jaeger instance.
Step 2. Create tracing. JS to initialize open telemetry JavaScript, tracing.
JS Step 3. Add it to your entry file our service is now exporting traces.
Step 4. Spin up Jaeger locally, or use Grafana Tempo. Here's how we test locally. Visit http://localhost.port16686
to view your traces. Chaining traces across services now say you have another service,
order service, that calls user service. Ifboth are instrumented with OpenTelemetry, you'll
get a full trace of the user request hopping between them. And the best part? Open telemetry handles trace context propagation via HTTP headers automatically.
You don't have to manually pass trace IDs between services.
Adding custom spans for business logic, sometimes auto instrumentation isn't enough.
For example, if you want to trace a DB query or external API call, this is super
helpful when you want to track performance of
specific business logic. Best practices we've learned the hard W A Y 1. Use semantic conventions
instead of inventing your own attribute names. Stick with the open telemetry semantic conventions.
These make your traces easier to understand and compatible with tools like Grafana, Tempo, etc.
Example 2.
Sample wisely if you trace every single request, your system will drown in data.
Use trace sampling.
E.G. 10% or only errors.
JavaScript 3.
Use OpenTelemetryCollector in productionDintExportTelemetryData directly from your services to your backend.
Route it through the OpenTelemetry data directly from your services to your back end. Route it through the open telemetry collector.
It gives you buffering, batching, retries, and format conversion.
4. Don't log P in spans this one's critical.
Be super careful not to store usernames, emails, credit card info, etc. in span attributes or logs.
Stick to metadata and identifiers.
Where this has helped us most debugging latency issues,
seeing full traces across 4 to 5 microservices helped us identify bottlenecks in minutes.
Identifying retry storms.
We spotted a service calling another in a loop with retries,
something we wouldn't have caught via logs.
Deployment regressions.
Comparing traces from one version to the next showed us exactly what change.
Bonus.
Tracing in a multi-language stackware using Node.
JS for some services, go for others.
Open telemetry made it easy to instrument both
and send all data to a single place.
Jaeger for dev, Grafana Cloud in staging, prod.
No vendor lock-in, no mismatch in trace formats.
Just pure visibility.
Conclusion. If you're building microservices, start with observability.
Microservices give us scale and flexibility, but they also bring complexity.
Without proper observability, you're flying blind. Open telemetry has become a core part of our architecture, not just for debugging, but for optimizing performance, reliability,
and ultimately, the user experience.
If anyone not using it yet, I strongly recommend giving it a shot.
Even a basic setup with Jaeger and a couple services will make you wonder how you ever
lived without it.
Thank you for listening to this Hacker Noon story, read by Artificial Intelligence.
Visit HackerNoon.com to read, write, learn and publish.
