Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] distributed tracing inconsistency

See original GitHub issue

Library name and version

Azure.Messaging.EventHubs 5.9.2

Describe the bug

an EventHubBufferedProducerClient.EnqueueEventAsync(eventData); is used to enqueue a message. I think it’s safe to assume, that the activity used to enqueue that message has since ended and its activity is no longer in scope.

the following are all set on that message when it was enqueued:

AppContext.SetSwitch("Azure.Experimental.EnableActivitySource", true)
eventData.Properties.TryAdd("traceparent", Activity.Current?.Id);
eventData.Properties.TryAdd("tracestate", Activity.Current?.TraceStateString);

The distributed traces in application insights do not link the consumer traces to the producer traces unless i manually set _telemetryClient.Context.Operation.ParentId = args.Properties["traceparent"] in the corresponding consumer service application. If there are a chain of services to follow then the same would hold true for them; neither the producer nor the consumer traces to follow would show in the distributed trace.

Expected behavior

I feel like the sdk has all the information it needs to reconstruct the activity so that the consumer clients don’t need to set _telemetryClient.Context.Operation.ParentId = args.Properties["traceparent"]. Moreover, any traces/requests/dependencies would not be correlated if they occur before i have the opportunity to set that ParentId.

It should also probably be documented that the BufferedProducerClient requires the setting of those three properties/appswitches so that full distributed tracing works when using a buffered producer. It wasn’t documented anywhere I could find and I had to read the source code to figure it out.

Actual behavior

without setting _telemetryClient.Context.Operation.ParentId = args.Properties["traceparent"] the distributed trace looks like so

with none of the distributed tracing that would include or follow that consumer service and its downstream services

Reproduction Steps

use an EventHubBufferedProducerClient.EnqueueEventAsync(eventData); to enqueue a message. ensure that the activity used to enqueue that message has since ended and its activity is no longer in scope.

set the following are all set on that message when it was enqueued:

AppContext.SetSwitch("Azure.Experimental.EnableActivitySource", true)
eventData.Properties.TryAdd("traceparent", Activity.Current?.Id);
eventData.Properties.TryAdd("tracestate", Activity.Current?.TraceStateString);

use a consumer service application, in our case we used the EventProcessorClient, to consume a message from that eventhub and then write it to another eventhub. Then check the transactions in app insights for that original request. The traces for the first producer and first consumer app are there, but not the downstream consumer app. The first consumer’s requests/dependencies/links are all missing from within the timeline of the consumer app as well and you won’t see the chain of services to follow in that timeline and the second producer and second consumer traces, requests, dependencies would all be missing.

Environment

❯ dotnet --info .NET SDK: Version: 7.0.400 Commit: 73bf45718d

Runtime Environment: OS Name: Mac OS X OS Version: 13.4 OS Platform: Darwin RID: osx.13-arm64 Base Path: /usr/local/share/dotnet/sdk/7.0.400/

Host: Version: 7.0.10 Architecture: arm64 Commit: a6dbb800a4

.NET SDKs installed: 7.0.400 [/usr/local/share/dotnet/sdk]

.NET runtimes installed: Microsoft.AspNetCore.App 7.0.10 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App] Microsoft.NETCore.App 7.0.10 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]

Other architectures found: None

Environment variables: Not set

global.json file: Not found

Learn more: https://aka.ms/dotnet/info

Download .NET: https://aka.ms/dotnet/download

Issue Analytics

State:
Created a month ago
Reactions:1
Comments:5 (3 by maintainers)

Top GitHub Comments

2reactions

serpentfabriccommented, Aug 18, 2023

i can confirm that the manual workaround works as intended.

2reactions

lmolkovacommented, Aug 17, 2023

@serpentfabric

I believe you’re talking about two separate issues here:

EventHubBufferedProducerClient does not populate trace context on the messages - this is indeed a bug that we should fix.
downstream calls on the consumer are not correlated to the message processing - this something we don’t quite understand - the fact that EventProcessor.Process and EventProcessor.Checkpoint are captured and correlated means that we populated Activity.Current and it points to EventProcessor.Process - so anything that happens within it would be correlated.

Before we dive into p2, could you please try something:

If you’re using ApplicationInsights SDK, please remove AppContext switches - they are only needed when using OpenTelemetry
Instead of manually populating traceparent, please populate Diagnostic-Id (this is what is used by ApplicationInsights SDK integration) eventData.Properties.TryAdd("Diagnostic-Id", Activity.Current?.Id);

Please provide an example of trace you’ll see and share what should be there, but is not correlated

Also, setting _telemetryClient.Context.Operation.ParentId = args.Properties["traceparent"] is quite error-prone - this sets parent for everything emitted with this _telemetryClient instance and would be wrong if there is any concurrency. This definitely should not be necessary and we expect that with us fixing p1 above or the manual workaround with Diagnostic-Id, it won’t be needed.

Thanks!

Top Results From Across the Web

Understand and use the distributed tracing UI

Distributed tracing helps you monitor and analyze the behavior of your distributed system. After you enable distributed tracing, you can use our UI...

What Is Distributed Tracing? Best Practices & Examples

Distributed tracing helps teams see how code is executing as requests traverse complex distributed systems. Learn more about how it can help your...

Distributed Tracing: All you need to know to get started

Distributed tracing is a method for tracking all the operations within a distributed system that have been triggered by a specific request.

Distributed Tracing Matters | by Tobias Schmidt | Medium

Facing issues with a monolith is mostly easy to investigate. You can look up execution traces and have a detailed look into errors...

What Is Distributed Tracing? An Introduction

Distributed tracing is used by IT and DevOps teams to track requests or transactions through the application they are monitoring — gaining vital ......