[BUG] APM is not properly flushed when a Worker Services app is exiting
See original GitHub issueAPM Agent version
The version of the Elastic.Apm
nuget package used 1.18.0
Environment
Operating system and version: Windows 10 and Docker with WSL 2
.NET Framework/Core name and version: .NET SDK 6/7 (mcr.microsoft.com/dotnet/aspnet:6.0 when on docker)
Application Target Framework(s): net6.0/net7.0
Describe the bug
When the application ends while APM has not yet flushed all it’s buffered messages an exception is thrown.
To Reproduce
Steps to reproduce the behavior:
- Create an empty Worker service
- Install and configure
Elastic.APM.NetCoreAll
1.18.0 - In
Program.cs
callUseAllElasticApm()
onIHostBuilder
using APMTest;
using Elastic.Apm.NetCoreAll;
IHost host = Host.CreateDefaultBuilder(args)
.ConfigureServices(services =>
{
services.AddHostedService<Worker>();
})
.UseAllElasticApm()
.Build();
await host.RunAsync();
- On
ExecuteAsync
terminate the application after a short delay
namespace APMTest
{
public class Worker : BackgroundService
{
private readonly ILogger<Worker> _logger;
private readonly IHostApplicationLifetime _hostApplicationLifetime;
public Worker(ILogger<Worker> logger, IHostApplicationLifetime hostApplicationLifetime)
{
_logger = logger;
_hostApplicationLifetime = hostApplicationLifetime;
}
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
_logger.LogInformation("Worker running at: {time}", DateTimeOffset.Now);
await Task.Delay(5000, stoppingToken);
_hostApplicationLifetime.StopApplication();
}
public override async Task StopAsync(CancellationToken cancellationToken)
{
_logger.LogInformation("Exiting application...");
await base.StopAsync(cancellationToken);
}
}
}
Expected behavior
Either the application exists gracefully and outputs the messages that were not sent or waits for the APM flush to complete.
Example of what happens when the await duration is longer (e.g. 30-60+ seconds):
...
info: APMTest.Worker[0]
Exiting application...
warn: Elastic.Apm[0]
{PayloadSenderV2} Cancellation requested. Following events were not transferred successfully to the server (https://.../):
Elastic.Apm.Metrics.MetricSet,
Elastic.Apm.Metrics.MetricSet,
Elastic.Apm.Metrics.MetricSet,
Elastic.Apm.Metrics.MetricSet,
Elastic.Apm.Metrics.MetricSet
Actual behavior
The an error occurs
...
info: APMTest.Worker[0]
Exiting application...
fail: Elastic.Apm[0]
{BackendCommComponentBase (PayloadSenderV2)} WorkLoop Current thread: `ElasticApmPayloadSenderV2' (managed ID: 10)
System.InvalidOperationException: The source completed without providing data to receive.
at System.Threading.Tasks.Dataflow.Internal.Common.InitializeStackTrace(Exception exception)
--- End of stack trace from previous location ---
at System.Threading.Tasks.Dataflow.DataflowBlock.Receive[TOutput](ISourceBlock`1 source, TimeSpan timeout, CancellationToken cancellationToken)
at Elastic.Apm.Report.PayloadSenderV2.ReceiveBatch()
at Elastic.Apm.Report.PayloadSenderV2.WorkLoopIteration()
at Elastic.Apm.BackendComm.BackendCommComponentBase.WorkLoop()
Additional observation
On Elastic.APM.NetCoreAll
version 1.14.0 the application does not output an error, but rather hangs indefinitely never exiting and utilizes the equivalent of a 1 CPU core.
Issue Analytics
- State:
- Created 10 months ago
- Reactions:3
- Comments:5 (3 by maintainers)
Top GitHub Comments
Hi @lazar-boradzhiev, sorry, a cannot give you a definite ETA for the
Flush
API, but this definitely something we will be working on soon. Please keep watching #288.I will close this issue though as it is not actually a bug but rather unspecified agent behavior. The solution will be the mentioned
Flush
API. I hope you can agree to that approach.Hey, @z1c0, I tried to dig in and understand if there is some easy way to mitigate the issue here, but to no avail. I suppose we will stick to longer wait duration before exiting until the Flush API is released. Is there any ETA for it?