[BUG] distributed tracing inconsistency
See original GitHub issueLibrary name and version
Azure.Messaging.EventHubs 5.9.2
Describe the bug
an EventHubBufferedProducerClient.EnqueueEventAsync(eventData);
is used to enqueue a message. I think it’s safe to assume, that the activity used to enqueue that message has since ended and its activity is no longer in scope.
the following are all set on that message when it was enqueued:
AppContext.SetSwitch("Azure.Experimental.EnableActivitySource", true)
eventData.Properties.TryAdd("traceparent", Activity.Current?.Id);
eventData.Properties.TryAdd("tracestate", Activity.Current?.TraceStateString);
The distributed traces in application insights do not link the consumer traces to the producer traces unless i manually set _telemetryClient.Context.Operation.ParentId = args.Properties["traceparent"]
in the corresponding consumer service application. If there are a chain of services to follow then the same would hold true for them; neither the producer nor the consumer traces to follow would show in the distributed trace.
Expected behavior
I feel like the sdk has all the information it needs to reconstruct the activity so that the consumer clients don’t need to set _telemetryClient.Context.Operation.ParentId = args.Properties["traceparent"]
. Moreover, any traces/requests/dependencies would not be correlated if they occur before i have the opportunity to set that ParentId.
It should also probably be documented that the BufferedProducerClient requires the setting of those three properties/appswitches so that full distributed tracing works when using a buffered producer. It wasn’t documented anywhere I could find and I had to read the source code to figure it out.
Actual behavior
without setting _telemetryClient.Context.Operation.ParentId = args.Properties["traceparent"]
the distributed trace looks like so
with none of the distributed tracing that would include or follow that consumer service and its downstream services
Reproduction Steps
use an EventHubBufferedProducerClient.EnqueueEventAsync(eventData);
to enqueue a message. ensure that the activity used to enqueue that message has since ended and its activity is no longer in scope.
set the following are all set on that message when it was enqueued:
AppContext.SetSwitch("Azure.Experimental.EnableActivitySource", true)
eventData.Properties.TryAdd("traceparent", Activity.Current?.Id);
eventData.Properties.TryAdd("tracestate", Activity.Current?.TraceStateString);
use a consumer service application, in our case we used the EventProcessorClient, to consume a message from that eventhub and then write it to another eventhub. Then check the transactions in app insights for that original request. The traces for the first producer and first consumer app are there, but not the downstream consumer app. The first consumer’s requests/dependencies/links are all missing from within the timeline of the consumer app as well and you won’t see the chain of services to follow in that timeline and the second producer and second consumer traces, requests, dependencies would all be missing.
Environment
❯ dotnet --info .NET SDK: Version: 7.0.400 Commit: 73bf45718d
Runtime Environment: OS Name: Mac OS X OS Version: 13.4 OS Platform: Darwin RID: osx.13-arm64 Base Path: /usr/local/share/dotnet/sdk/7.0.400/
Host: Version: 7.0.10 Architecture: arm64 Commit: a6dbb800a4
.NET SDKs installed: 7.0.400 [/usr/local/share/dotnet/sdk]
.NET runtimes installed: Microsoft.AspNetCore.App 7.0.10 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App] Microsoft.NETCore.App 7.0.10 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
Other architectures found: None
Environment variables: Not set
global.json file: Not found
Learn more: https://aka.ms/dotnet/info
Download .NET: https://aka.ms/dotnet/download
Issue Analytics
- State:
- Created a month ago
- Reactions:1
- Comments:5 (3 by maintainers)
Top GitHub Comments
i can confirm that the manual workaround works as intended.
@serpentfabric
I believe you’re talking about two separate issues here:
EventHubBufferedProducerClient
does not populate trace context on the messages - this is indeed a bug that we should fix.EventProcessor.Process
andEventProcessor.Checkpoint
are captured and correlated means that we populatedActivity.Current
and it points toEventProcessor.Process
- so anything that happens within it would be correlated.Before we dive into p2, could you please try something:
traceparent
, please populateDiagnostic-Id
(this is what is used by ApplicationInsights SDK integration)eventData.Properties.TryAdd("Diagnostic-Id", Activity.Current?.Id);
Also, setting
_telemetryClient.Context.Operation.ParentId = args.Properties["traceparent"]
is quite error-prone - this sets parent for everything emitted with this_telemetryClient
instance and would be wrong if there is any concurrency. This definitely should not be necessary and we expect that with us fixing p1 above or the manual workaround withDiagnostic-Id
, it won’t be needed.Thanks!