A request enters your system through an API gateway, hits an authentication service, queries a database, calls a payment provider, publishes an event to a message queue, and returns a response. When that request takes 4 seconds instead of 400 milliseconds, which service is responsible?
Without distributed tracing, you open five dashboards, compare timestamps in five different log streams, and try to reconstruct the request path from memory. With distributed tracing, you open one trace and see every hop, every duration, and every failure — in a single view.
Distributed tracing is the practice of propagating a unique identifier through every service that handles a request, recording the work each service does as spans, and assembling those spans into a trace that represents the request's complete journey.
The mental model: spans and traces
A span is a named, timed operation. "Query user table" is a span. "Call Stripe API" is a span. "Validate JWT" is a span. Each span records:








