Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[QUERY] Distributed tracing Activity scope for batched Service Bus Functions

See original GitHub issue

Library name and version

Azure.Messaging.ServiceBus 7.13.1 Microsoft.Azure.WebJobs.Extensions.ServiceBus 5.9.0

Query/Question

There currently exists discrepancies when processing Service Bus messages in either a batched versus singular fashion regarding Activity scoping and how the Functions consider incoming message Diagnostic-Id properties.

In my scenario, one (or many) message producers are sending Service Bus messages to a given queue/topic and each of these messages may contain different Diagnostic-Ids. The expectation here, is that this producer-provided trace ID is appropriately utilised within the Function when processing the message, and in turn is appropriately sent to any downstream components.

Now, when using a singular message Function (as below), this works perfectly.

public ProcessIncomingMessage([ServiceBusTrigger(...)] ServiceBusReceivedMessage message)
{
    // Activity "ServiceBusProcessor.ProcessMessage"
    // uses incoming `Diagnostic-Id` ✅

    DoWork(message);
}

When using a batched message Function, however, incoming Diagnostic-Ids are being somewhat “squashed” at this level; the Activity uses a new ID (which is being sent downstream), and the individual message trace IDs are added to Activity.Links. As below:

public ProcessIncomingMessages([ServiceBusTrigger(...)] ServiceBusReceivedMessage[] messages)
{
    // Activity "ServiceBusListener.ProcessMessages"
    // uses new Id; incoming `Diagnostic-Id`s added to `Activity.Links` 🤔

    foreach (var message in messages)
    {
        DoWork(message);
    }
}

This seems backwards, I would expect that the incoming Diagnostic-Ids be used as the “primary” Activity ID and the “batch” Activity is added as a Link (or equivalent).

Whether or not the message consumer processes messages in batch or singularly could be considered an implementation detail (at least in this scenario) and the message producer shouldn’t care; it would expect that the Diagnostic-Id it provided in a given message is the trace ID used for any downstream actions performed in its processing.

Now, by this point it may seem obvious that such scoping is just not really feasible within a method written like this (with a ServiceBusReceivedMessage[] argument), and the workaround/solution is to simply perform this context flip manually within the batch processor:

foreach (var message in messages)
{
    using var activity = new Activity("ServiceBusProcessor.ProcessMessage");

    // preserve reference to the batch Activity (optional)
    activity.AddBaggage("BatchActivityId", Activity.Current.Id);

    // use incoming `Diagnostic-Id` (if present)
    if (message.ApplicationProperties.TryGetValue("Diagnostic-Id", out var value)
        && value is string diagnosticId
        && ActivityContext.TryParse(diagnosticId, null, out var parentContext))
    {
        activity.SetParentId(parentContext.TraceId, parentContext.SpanId, parentContext.TraceFlags);
    }

    activity.Start();

    DoWork(message);
}

I guess my main question at this point is whether or not this would be the suggested approach; or whether there’s another way of implementing a batch-triggered Function such that the trace IDs are registered as I would expect them to be (per-message). The documentation I could find on this subject is rather limited.

cc @JoshLove-msft #30279

Environment

.Net SDK 6.0.407

Issue Analytics

State:
Created 5 months ago
Comments:19 (8 by maintainers)

Top GitHub Comments

1reaction

JoshLove-msftcommented, Apr 11, 2023

I should clarify that the behavior to use links with batches applies to both ActivitySource (which is still experimental), and the GA DiagnosticSource support. However, it was influenced by the Open Telemetry spec. I will add a section into the guide that discusses the different behavior between batches/single messages.

0reactions

github-actions[bot]commented, Apr 12, 2023

Hi @jacobjmarks. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text “/unresolve” to remove the “issue-addressed” label and continue the conversation.