Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[PROPOSAL] First-class distributed tracing

See original GitHub issue

Over the past weeks, I’ve been struggling to get distributed tracing to work.

My observation is that a Java application that uses the Dapr SDK for Java does not propagate the traceparent HTTP-header from a call it receives from the sidecar back to the sidecar (e.g. one that does invokeMethod or publishEvent).

Looking at the code, I think that if the sidecar would use gRPC to invoke the Java application, it wouldn’t work either - the grpc-trace-bin, traceparent or tracestate headers are propagated but read from an empty Context, it seems. But I haven’t been able to let the sidecar talk gRPC to my Java apps, so I’m not 100% sure about this.

Nevertheless, in a platform that wants to simplify distributed systems, I think tracing is an essential thing. I’d love to see the Dapr SDK for Java make it as easy as possible to leverage what Dapr has to offer in this field.

Describe the proposal

Provide the Dapr client code (at least DaprClientBuilder, DaprHttpBuilder, DaprClientHttp and DaprClientGrpc classes, maybe more) with a “strategy” to get a traceparent header value. Use that strategy to enrich calls from the application to the sidecar.

The default implementation should return an empty value. A user can supply their own implementation based on the frameworks they are using. For instance, I could envision an implementation that uses OpenTelemetry’s Span class to find the correct value for the traceparent header.

Issue Analytics

State:
Created 2 years ago
Reactions:2
Comments:10 (7 by maintainers)

Top GitHub Comments

1reaction

mthmulderscommented, Nov 23, 2021

Good news - I was able to get distributed tracing to work with Dapr & Spring Sleuth, too. Even better, I can make that fit in the same structure as the one that is used with OpenTelemetry: a Function<reactor.util.context.Context, reactor.util.context.Context>.

Since Sleuth auto-populates the Reactor Context with the Sleuth TraceContext, the “tracing context enricher” could be a singleton - it’s logic is completely implemented without instance variables.

For OpenTracing, the “enricher” does need an instance variable (see this example) but I think that’s not a big of a problem.

Aside

The Sleuth integration works when the app communicaties with the sidecar over gRPC, but does not when they use HTTP. Sleuth puts a few more entries in the React Context and some of their names contain a space, e.g. "interface org.springframework.cloud.sleuth.Tracer". Even when you remove those entries in a Reactor.contextWrite(.....), they pop back up when the Reactor Context is used to write HTTP headers; and HTTP header names do not allow spaces. I think the fact that you can’t remove something from the context is a bug, but I need to dive into that further.

So, to get back to the original proposal:

Describe the proposal

Provide the Dapr client with a “strategy” to enrich the Reactor Context with tracing information. The DaprClientGrpc and DaprClientHttp classes will invoke that strategy to enrich calls from the application to the sidecar with a tracing context.

The contract would at the very minimum look like this:

public interface DaprTelemetryInjector
    extends java.util.function.Function<reactor.util.context.Context, reactor.util.context.Context> {
}

In order to push implementors to providing useful values, we could make it a bit more expressive by providing a default implementation, and delegating to unimplemented values:

public interface DaprTelemetryInjector
    extends java.util.function.Function<reactor.util.context.Context, reactor.util.context.Context> {

    @Override
    public Context apply(final reactor.util.context.Context context) {
        // enrich the incoming context with the result of calculateTraceState, calculateTraceParent
        // provided that they do not return null.
    }

    String calculateTraceState(final reactor.util.context.Context);
    String calculateTraceParent(final reactor.util.context.Context);
}

The default implementation could return an empty value. The Dapr SDK for Java could provide out-of-the box implementations for OpenTracing and Sleuth, two popular tracing libraries. A user could of course also supply their own implementation based on the frameworks they are using.

Because those implementations depend on external libraries, we will introduce “optional” dependencies to both the OpenTracing and the Sleuth API’s.

0reactions

mthmulderscommented, Apr 19, 2022

Will any of the implementations have any Dapr specific logic?

I don’t think so. My expectation is that it’s more a matter of wiring the particular tracing library (Sleuth, OpenTracing, etc.) into the Dapr SDK.

We also want to minimize dependencies, so adding implementations means we need to create new artifacts to publish to Maven, one per framework. This way, users can opt-in to our implementation for each one - but I would like to avoid going down that path initially simply because of the maintenance effort to keep each one up to date and deal with breaking changes in dependencies.

Minimising dependencies absolutely makes sense to me, and I think unless there’s a lot of demand from the community, we shouldn’t be providing our own implementations at this point. We could deliver sample implementations, or, as @wmeints suggests, a blog post that explains how to do it.

My main doubt (still!) with the idea is whether it’s actually going to work. I simply don’t know for sure whether we will be able to guarantee that the Reactor Context we are passing to the interface impl. is going to be the correct one. I lack some in-depth knowledge about Reactor to be sure about it. The implementations will need to be 100% thread-safe, that’s for sure. Face it: if we’d pass the wrong Reactor Context, we’d be more wrong than the current situation, which simply doesn’t pass any Context.