question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Elasticsearch operations produce span retrieve error in distributed transactions

See original GitHub issue

Describe the bug Hi,

I didn’t notice this with previous versions of APM, but right know if I create a manual transaction with an existing distributed tracing id with inner spans, on each Elasticsearch (Nest lib) I get the following errors from APM like Failed to find current span in ConcurrentDictionary

To Reproduce Steps to reproduce the behavior: 1.My use case is the following : App1 is publishing a message to Kafka, in the message I set a distributed correlationId using the following :

        public static string GetCorrelationLogsId() => 
                (Agent.Tracer.CurrentSpan?.OutgoingDistributedTracingData
                    ?? Agent.Tracer.CurrentTransaction?.OutgoingDistributedTracingData)?.SerializeToString();

Let’s say the string returned is 00-67fe36868e7ce8498bc0195b3a0acb4f-f867f0376eec7a49-01, it’s then set into the message before sending it to Kafka.

  1. Then App2 is consuming the message, and use the correlationId to open a new manual transaction
            await Agent.Tracer
              .CaptureTransaction("myTransaction", ApiConstants.SubtypeElasticsearch, async (t) =>
              {
                      ... some code
                      await t.CaptureSpan("PersistEntity", ApiConstants.SubtypeElasticsearch, async span =>
                      {
                          _logger.LogDebug("Persist entity in Elk with Id {id}", mymessage.Id);
                          await _elasticsearchRepository.Add(mymessage);  // produces errors !
                      });
                       ... some code
            }, DistributedTracingData.TryDeserializeFromString(mymessage.DistributedTracingData));  // 00-67fe36868e7ce8498bc0195b3a0acb4f-f867f0376eec7a49-01
  1. The call of Elastic repository method produces the following logs
[07:51:17 INF] {RequestPipelineDiagnosticsListener} Received an CallElasticsearch.Start event from elasticsearch
[07:51:17 INF] {SerializerDiagnosticsListener} Received an Serialize.Start event from elasticsearch
[07:51:17 ERR] {SerializerDiagnosticsListener} Failed to find current span in ConcurrentDictionary 00-4b8a560b65e8ff4cb4f6e376f526c3ac-5fcbad25ebedd341-01
[07:51:17 INF] {HttpConnectionDiagnosticsListener} Received an SendAndReceiveHeaders.Start event from elasticsearch
[07:51:17 ERR] {HttpConnectionDiagnosticsListener} Failed to find current span in ConcurrentDictionary 00-4b8a560b65e8ff4cb4f6e376f526c3ac-5fcbad25ebedd341-01
[07:51:17 INF] {HttpConnectionDiagnosticsListener} Received an ReceiveBody.Start event from elasticsearch
[07:51:17 INF] {SerializerDiagnosticsListener} Received an Deserialize.Start event from elasticsearch
[07:51:17 ERR] {SerializerDiagnosticsListener} Failed to find current span in ConcurrentDictionary 00-4b8a560b65e8ff4cb4f6e376f526c3ac-dc31ed64b33f0145-01
[07:51:17 ERR] {HttpConnectionDiagnosticsListener} Failed to find current span in ConcurrentDictionary 00-4b8a560b65e8ff4cb4f6e376f526c3ac-5fcbad25ebedd341-01
[07:51:17 ERR] {RequestPipelineDiagnosticsListener} Failed to find current span in ConcurrentDictionary 00-4b8a560b65e8ff4cb4f6e376f526c3ac-302847dec737e94c-01

=> the correlation id reported in logs are not the same as the one used to initiate the transaction (ie : transported in message payload)

  1. The correlation Id seems to be correctly used thanks to the APM trace sample since both applications appear in the same transaction image

Versions and config Here are the current versions of used libs

    <TargetFramework>net5.0</TargetFramework>
    <PackageReference Include="Elastic.Apm.NetCoreAll" Version="1.7.1" />
    <PackageReference Include="Elastic.Apm.SerilogEnricher" Version="1.5.1" />
    <PackageReference Include="Elastic.CommonSchema.Serilog" Version="1.5.1" />
    <PackageReference Include="NEST" Version="7.10.1" />

And my appsettings.json

  "ElasticApm": {
    "ServerUrls": "http://apm:8200",
    "TransactionMaxSpans": 5000,
    "CaptureBody": "all",
    "CloudProvider": "none"
  }

Expected behavior I expect no error from APM or to understand what I did wrong

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:21 (8 by maintainers)

github_iconTop GitHub Comments

3reactions
stevejgordoncommented, Feb 17, 2021

@NicolasREY69330 - We’ve just pushed packages for 7.11.1 which include a fix for this bug.

3reactions
russcamcommented, Feb 17, 2021

I’ve opened https://github.com/elastic/elasticsearch-net/pull/5326 to address.

I’ve outlined the problem in the PR; it’s a rather pernicious and subtle bug that was introduced when the clients were updated to System.Diagnostics.DiagnosticSource, and stems from Activity now implementing Dispose() and setting Activity.Current to the parent Activity on Dispose(), which ends up happening when the derived Diagnostic types in the client are disposed, and before it notifies any Diagnostic Listeners of the stopping of the activity that happens in the overridden Dispose(bool).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Tracing Fields | Elastic Common Schema (ECS) Reference ...
A span represents an operation within a transaction, such as a request to another service, or a database query. type: keyword. example: 3ff9a8981b7ccd5a....
Read more >
Spans | APM Overview [6.8]
Missing Spansedit​​ Similarly to dropped spans, transactions may have missing spans. This can happen because spans are streamed from the APM Agent to...
Read more >
Troubleshooting index lifecycle management errors
When ILM executes a lifecycle policy, it's possible for errors to occur while performing the necessary index operations for a step.
Read more >
Public API | APM .NET Agent Reference [1.x]
The public API of the Elastic APM .NET agent lets you customize and manually create spans and transactions, as well as track errors....
Read more >
Custom Transactions | APM Real User Monitoring ...
Elastic APM uses the concept of transactions and spans to collect performance data: Spans measure a unit of operation and are grouped into...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found