question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] distributed tracing inconsistency

See original GitHub issue

Library name and version

Azure.Messaging.EventHubs 5.9.2

Describe the bug

an EventHubBufferedProducerClient.EnqueueEventAsync(eventData); is used to enqueue a message. I think it’s safe to assume, that the activity used to enqueue that message has since ended and its activity is no longer in scope.

the following are all set on that message when it was enqueued:

AppContext.SetSwitch("Azure.Experimental.EnableActivitySource", true)
eventData.Properties.TryAdd("traceparent", Activity.Current?.Id);
eventData.Properties.TryAdd("tracestate", Activity.Current?.TraceStateString);

The distributed traces in application insights do not link the consumer traces to the producer traces unless i manually set _telemetryClient.Context.Operation.ParentId = args.Properties["traceparent"] in the corresponding consumer service application. If there are a chain of services to follow then the same would hold true for them; neither the producer nor the consumer traces to follow would show in the distributed trace.

Expected behavior

I feel like the sdk has all the information it needs to reconstruct the activity so that the consumer clients don’t need to set _telemetryClient.Context.Operation.ParentId = args.Properties["traceparent"]. Moreover, any traces/requests/dependencies would not be correlated if they occur before i have the opportunity to set that ParentId.

It should also probably be documented that the BufferedProducerClient requires the setting of those three properties/appswitches so that full distributed tracing works when using a buffered producer. It wasn’t documented anywhere I could find and I had to read the source code to figure it out.

Actual behavior

without setting _telemetryClient.Context.Operation.ParentId = args.Properties["traceparent"] the distributed trace looks like so

image

with none of the distributed tracing that would include or follow that consumer service and its downstream services

Reproduction Steps

use an EventHubBufferedProducerClient.EnqueueEventAsync(eventData); to enqueue a message. ensure that the activity used to enqueue that message has since ended and its activity is no longer in scope.

set the following are all set on that message when it was enqueued:

AppContext.SetSwitch("Azure.Experimental.EnableActivitySource", true)
eventData.Properties.TryAdd("traceparent", Activity.Current?.Id);
eventData.Properties.TryAdd("tracestate", Activity.Current?.TraceStateString);

use a consumer service application, in our case we used the EventProcessorClient, to consume a message from that eventhub and then write it to another eventhub. Then check the transactions in app insights for that original request. The traces for the first producer and first consumer app are there, but not the downstream consumer app. The first consumer’s requests/dependencies/links are all missing from within the timeline of the consumer app as well and you won’t see the chain of services to follow in that timeline and the second producer and second consumer traces, requests, dependencies would all be missing.

Environment

❯ dotnet --info .NET SDK: Version: 7.0.400 Commit: 73bf45718d

Runtime Environment: OS Name: Mac OS X OS Version: 13.4 OS Platform: Darwin RID: osx.13-arm64 Base Path: /usr/local/share/dotnet/sdk/7.0.400/

Host: Version: 7.0.10 Architecture: arm64 Commit: a6dbb800a4

.NET SDKs installed: 7.0.400 [/usr/local/share/dotnet/sdk]

.NET runtimes installed: Microsoft.AspNetCore.App 7.0.10 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App] Microsoft.NETCore.App 7.0.10 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]

Other architectures found: None

Environment variables: Not set

global.json file: Not found

Learn more: https://aka.ms/dotnet/info

Download .NET: https://aka.ms/dotnet/download

Issue Analytics

  • State:closed
  • Created a month ago
  • Reactions:1
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
serpentfabriccommented, Aug 18, 2023

i can confirm that the manual workaround works as intended.

2reactions
lmolkovacommented, Aug 17, 2023

@serpentfabric

I believe you’re talking about two separate issues here:

  1. EventHubBufferedProducerClient does not populate trace context on the messages - this is indeed a bug that we should fix.
  2. downstream calls on the consumer are not correlated to the message processing - this something we don’t quite understand - the fact that EventProcessor.Process and EventProcessor.Checkpoint are captured and correlated means that we populated Activity.Current and it points to EventProcessor.Process - so anything that happens within it would be correlated.

Before we dive into p2, could you please try something:

  • If you’re using ApplicationInsights SDK, please remove AppContext switches - they are only needed when using OpenTelemetry
  • Instead of manually populating traceparent, please populate Diagnostic-Id (this is what is used by ApplicationInsights SDK integration) eventData.Properties.TryAdd("Diagnostic-Id", Activity.Current?.Id);
  1. Please provide an example of trace you’ll see and share what should be there, but is not correlated

Also, setting _telemetryClient.Context.Operation.ParentId = args.Properties["traceparent"] is quite error-prone - this sets parent for everything emitted with this _telemetryClient instance and would be wrong if there is any concurrency. This definitely should not be necessary and we expect that with us fixing p1 above or the manual workaround with Diagnostic-Id, it won’t be needed.

Thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Understand and use the distributed tracing UI
Distributed tracing helps you monitor and analyze the behavior of your distributed system. After you enable distributed tracing, you can use our UI...
Read more >
What Is Distributed Tracing? Best Practices & Examples
Distributed tracing helps teams see how code is executing as requests traverse complex distributed systems. Learn more about how it can help your...
Read more >
Distributed Tracing: All you need to know to get started
Distributed tracing is a method for tracking all the operations within a distributed system that have been triggered by a specific request.
Read more >
Distributed Tracing Matters | by Tobias Schmidt | Medium
Facing issues with a monolith is mostly easy to investigate. You can look up execution traces and have a detailed look into errors...
Read more >
What Is Distributed Tracing? An Introduction
Distributed tracing is used by IT and DevOps teams to track requests or transactions through the application they are monitoring — gaining vital ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found