Elasticsearch operations produce span retrieve error in distributed transactions
See original GitHub issueDescribe the bug Hi,
I didn’t notice this with previous versions of APM, but right know if I create a manual transaction with an existing distributed tracing id with inner spans, on each Elasticsearch (Nest lib) I get the following errors from APM like
Failed to find current span in ConcurrentDictionary
To Reproduce
Steps to reproduce the behavior:
1.My use case is the following : App1
is publishing a message to Kafka, in the message I set a distributed correlationId
using the following :
public static string GetCorrelationLogsId() =>
(Agent.Tracer.CurrentSpan?.OutgoingDistributedTracingData
?? Agent.Tracer.CurrentTransaction?.OutgoingDistributedTracingData)?.SerializeToString();
Let’s say the string returned is 00-67fe36868e7ce8498bc0195b3a0acb4f-f867f0376eec7a49-01
, it’s then set into the message before sending it to Kafka.
- Then
App2
is consuming the message, and use the correlationId to open a new manual transaction
await Agent.Tracer
.CaptureTransaction("myTransaction", ApiConstants.SubtypeElasticsearch, async (t) =>
{
... some code
await t.CaptureSpan("PersistEntity", ApiConstants.SubtypeElasticsearch, async span =>
{
_logger.LogDebug("Persist entity in Elk with Id {id}", mymessage.Id);
await _elasticsearchRepository.Add(mymessage); // produces errors !
});
... some code
}, DistributedTracingData.TryDeserializeFromString(mymessage.DistributedTracingData)); // 00-67fe36868e7ce8498bc0195b3a0acb4f-f867f0376eec7a49-01
- The call of Elastic repository method produces the following logs
[07:51:17 INF] {RequestPipelineDiagnosticsListener} Received an CallElasticsearch.Start event from elasticsearch
[07:51:17 INF] {SerializerDiagnosticsListener} Received an Serialize.Start event from elasticsearch
[07:51:17 ERR] {SerializerDiagnosticsListener} Failed to find current span in ConcurrentDictionary 00-4b8a560b65e8ff4cb4f6e376f526c3ac-5fcbad25ebedd341-01
[07:51:17 INF] {HttpConnectionDiagnosticsListener} Received an SendAndReceiveHeaders.Start event from elasticsearch
[07:51:17 ERR] {HttpConnectionDiagnosticsListener} Failed to find current span in ConcurrentDictionary 00-4b8a560b65e8ff4cb4f6e376f526c3ac-5fcbad25ebedd341-01
[07:51:17 INF] {HttpConnectionDiagnosticsListener} Received an ReceiveBody.Start event from elasticsearch
[07:51:17 INF] {SerializerDiagnosticsListener} Received an Deserialize.Start event from elasticsearch
[07:51:17 ERR] {SerializerDiagnosticsListener} Failed to find current span in ConcurrentDictionary 00-4b8a560b65e8ff4cb4f6e376f526c3ac-dc31ed64b33f0145-01
[07:51:17 ERR] {HttpConnectionDiagnosticsListener} Failed to find current span in ConcurrentDictionary 00-4b8a560b65e8ff4cb4f6e376f526c3ac-5fcbad25ebedd341-01
[07:51:17 ERR] {RequestPipelineDiagnosticsListener} Failed to find current span in ConcurrentDictionary 00-4b8a560b65e8ff4cb4f6e376f526c3ac-302847dec737e94c-01
=> the correlation id reported in logs are not the same as the one used to initiate the transaction (ie : transported in message payload)
- The correlation Id seems to be correctly used thanks to the APM trace sample since both applications appear in the same transaction
Versions and config Here are the current versions of used libs
<TargetFramework>net5.0</TargetFramework>
<PackageReference Include="Elastic.Apm.NetCoreAll" Version="1.7.1" />
<PackageReference Include="Elastic.Apm.SerilogEnricher" Version="1.5.1" />
<PackageReference Include="Elastic.CommonSchema.Serilog" Version="1.5.1" />
<PackageReference Include="NEST" Version="7.10.1" />
And my appsettings.json
"ElasticApm": {
"ServerUrls": "http://apm:8200",
"TransactionMaxSpans": 5000,
"CaptureBody": "all",
"CloudProvider": "none"
}
Expected behavior I expect no error from APM or to understand what I did wrong
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:21 (8 by maintainers)
Top GitHub Comments
@NicolasREY69330 - We’ve just pushed packages for 7.11.1 which include a fix for this bug.
I’ve opened https://github.com/elastic/elasticsearch-net/pull/5326 to address.
I’ve outlined the problem in the PR; it’s a rather pernicious and subtle bug that was introduced when the clients were updated to System.Diagnostics.DiagnosticSource, and stems from
Activity
now implementingDispose()
and settingActivity.Current
to the parentActivity
onDispose()
, which ends up happening when the derivedDiagnostic
types in the client are disposed, and before it notifies any Diagnostic Listeners of the stopping of the activity that happens in the overriddenDispose(bool)
.