Activities not mapping to the expected parents
See original GitHub issueQuestion
Describe your environment.
I have a WPF application (.NET Framework 4.7.1) to which we are exploring adding telemetry. This application uses HttpClient to send various web requests to a WebAPI backend. Eventually, this will expand to include the backend as well, but for now, we are just looking at the UI client.
What are you trying to achieve?
All of our web calls ultimately funnel through a single async method which is where we are calling startActivity. Inside that, it then calls HttpClient.PostAsync. We have the data transmitting to a Jaeger instance. What we expected to see is that our explicitly created “startActivity” calls there would be top level items and the activities generated by httpclient would then appear inside it.
However, what we are finding seems Very Random™. Sometimes our activity is a top level item, sometimes the HttpClient item is top level. Sometimes our activities have other “our activities” nested under them (which conceptually is not the case). In fact, I have yet to find a case where it worked “correctly” (based on expectation one of our activities had its one HTTP call within it and nothing else).
This async method I referred to is called from a number of different threads and these web calls will certainly overlap. It almost feels like the OpenTelemetry framework is not consistently picking up the correct parent activity because of this?
Additional Context
Example code is below that spins up 20 parallel calls to the HttpClient, each wrapped in a manually created activity, named with the numbers 1-20.
What i expected to see was 1-20 are all top level items in Jaeger, with 1 subitem on each for the HTTP call. Instead, many top level items no sub items, and some subitems have multiple. For example, in this screenshot, number 11 happened to get a bunch of the HTTP calls (but not all):
using System.Diagnostics;
using System.Linq;
using System.Net.Http;
using System.Reflection;
using System.Threading.Tasks;
using OpenTelemetry;
using OpenTelemetry.Trace;
namespace ConsoleApp3
{
class Program
{
static ActivitySource activitySource = new ActivitySource(Assembly.GetExecutingAssembly().GetName().Name, Assembly.GetExecutingAssembly().GetName().Version.ToString());
private static HttpClient client = new HttpClient();
static void Main(string[] args)
{
using var tracerProvider = Sdk.CreateTracerProviderBuilder()
.AddSource(Assembly.GetExecutingAssembly().GetName().Name)
.AddHttpClientInstrumentation()
.AddJaegerExporter()
.Build();
Task.WaitAll(Enumerable.Range(1, 20).Select(MakeWebCall).ToArray());
}
private static async Task MakeWebCall(int id)
{
using var activity = activitySource.StartActivity(id.ToString(), ActivityKind.Client);
using HttpResponseMessage msg = await client.PostAsync($"http://localhost/{id}", new StringContent("dummy data")).ConfigureAwait(false);
}
}
}
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (3 by maintainers)
Were you ever able to track down a workaround for this? Thanks!
Hello @cabadam Cijo asked me to respond because we recently worked with another customer facing same issue where telemetry was getting ‘mislabeled’ operation ids.
This is due to an issue somewhere in System.Diagnostics.DiagnosticSource namespace around code written to support Activities for the older desktop framework versions. We’ve seen scenarios where the wrong Activity object will get assigned to the wrong async thread, and so you’d see some telemetry items get set with the wrong distributed telemetry operation id.
This problem impacts Application Insights, Open Telemetry or any Diagnostic Listener implementation that is running on .NET 4.x frameworks. The injected diagnostic listeners don’t account for the behavior within .NET 4.x where the framework might chain calls out to the same URI on the same connection in a singular async context.
If you move the code over to .NET Core 3.1 or .NET 5.0 or later then you will avoid this problem. This only impacts 4.x apps.
One of our software architects came up with a workaround by supplying a custom HttpHandler to the HttpClient which will
I think you’ll be able to pass this custom handler, QueueHttpHandler, directly into your HttpClient constructor.
If you are doing dependency injection, like our earlier customer, then something like this will help configure the handler too:
Hope this helps. Let us know if the workaround works for you too. We tested it out earlier and saw the workaround hold up really well under load.