question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] APM is not properly flushed when a Worker Services app is exiting

See original GitHub issue

APM Agent version

The version of the Elastic.Apm nuget package used 1.18.0

Environment

Operating system and version: Windows 10 and Docker with WSL 2

.NET Framework/Core name and version: .NET SDK 6/7 (mcr.microsoft.com/dotnet/aspnet:6.0 when on docker)

Application Target Framework(s): net6.0/net7.0

Describe the bug

When the application ends while APM has not yet flushed all it’s buffered messages an exception is thrown.

To Reproduce

Steps to reproduce the behavior:

  1. Create an empty Worker service
  2. Install and configure Elastic.APM.NetCoreAll 1.18.0
  3. In Program.cs call UseAllElasticApm() on IHostBuilder
using APMTest;
using Elastic.Apm.NetCoreAll;

IHost host = Host.CreateDefaultBuilder(args)
    .ConfigureServices(services =>
    {
        services.AddHostedService<Worker>();
    })
    .UseAllElasticApm()
    .Build();

await host.RunAsync();

  1. On ExecuteAsync terminate the application after a short delay
namespace APMTest
{
    public class Worker : BackgroundService
    {
        private readonly ILogger<Worker> _logger;
        private readonly IHostApplicationLifetime _hostApplicationLifetime;

        public Worker(ILogger<Worker> logger, IHostApplicationLifetime hostApplicationLifetime)
        {
            _logger = logger;
            _hostApplicationLifetime = hostApplicationLifetime;
        }

        protected override async Task ExecuteAsync(CancellationToken stoppingToken)
        {
            _logger.LogInformation("Worker running at: {time}", DateTimeOffset.Now);
            await Task.Delay(5000, stoppingToken);
            _hostApplicationLifetime.StopApplication();
        }

        public override async Task StopAsync(CancellationToken cancellationToken)
        {
            _logger.LogInformation("Exiting application...");
            await base.StopAsync(cancellationToken);
        }
    }
}

Expected behavior

Either the application exists gracefully and outputs the messages that were not sent or waits for the APM flush to complete.

Example of what happens when the await duration is longer (e.g. 30-60+ seconds):

...
info: APMTest.Worker[0]
      Exiting application...
warn: Elastic.Apm[0]
      {PayloadSenderV2} Cancellation requested. Following events were not transferred successfully to the server (https://.../):
          Elastic.Apm.Metrics.MetricSet,
          Elastic.Apm.Metrics.MetricSet,
          Elastic.Apm.Metrics.MetricSet,
          Elastic.Apm.Metrics.MetricSet,
          Elastic.Apm.Metrics.MetricSet

Actual behavior

The an error occurs

...
info: APMTest.Worker[0]
      Exiting application...
fail: Elastic.Apm[0]
      {BackendCommComponentBase (PayloadSenderV2)} WorkLoop Current thread: `ElasticApmPayloadSenderV2' (managed ID: 10)
      System.InvalidOperationException: The source completed without providing data to receive.
         at System.Threading.Tasks.Dataflow.Internal.Common.InitializeStackTrace(Exception exception)
      --- End of stack trace from previous location ---
         at System.Threading.Tasks.Dataflow.DataflowBlock.Receive[TOutput](ISourceBlock`1 source, TimeSpan timeout, CancellationToken cancellationToken)
         at Elastic.Apm.Report.PayloadSenderV2.ReceiveBatch()
         at Elastic.Apm.Report.PayloadSenderV2.WorkLoopIteration()
         at Elastic.Apm.BackendComm.BackendCommComponentBase.WorkLoop()

Additional observation

On Elastic.APM.NetCoreAll version 1.14.0 the application does not output an error, but rather hangs indefinitely never exiting and utilizes the equivalent of a 1 CPU core.

Issue Analytics

  • State:closed
  • Created 10 months ago
  • Reactions:3
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
z1c0commented, Nov 30, 2022

Hi @lazar-boradzhiev, sorry, a cannot give you a definite ETA for the Flush API, but this definitely something we will be working on soon. Please keep watching #288.

I will close this issue though as it is not actually a bug but rather unspecified agent behavior. The solution will be the mentioned Flush API. I hope you can agree to that approach.

0reactions
lazar-boradzhievcommented, Nov 24, 2022

Hey, @z1c0, I tried to dig in and understand if there is some easy way to mitigate the issue here, but to no avail. I suppose we will stick to longer wait duration before exiting until the Flush API is released. Is there any ETA for it?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Common problems | APM Server Reference [7.15]
The most likely cause for this error is using incompatible versions of APM agent and APM Server. See the agent/server compatibility matrix for...
Read more >
APM Remote fails to run initialization due to Web Service ...
If the Synchronization Service has been configured properly and is up and running without errors, this is likely due to a configuration problem...
Read more >
Resolved Issues - TechDocs - Broadcom Inc.
The "Failed to re-connect" warning message appears multiple times in the Agent log when a connection to the collector is not possible.
Read more >
Applications Manager Issues Fixed
There was an issue in loading the APM Plugin upon clicking the 'Applications' tab from OpManager UI due to connection timeout error. In...
Read more >
APM-DHCP Access Policy Example and Detailed Instructions
IP address of an APM-DHCP virtual-server (on UDP port 67) with iRule ir-apm-dhcp. This IP must be reachable from your DHCP server(s).
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found