Experiencing a Memory Leak
See original GitHub issueDescribe the bug Several, but not all, tested .NET Framework services have what appears to be a memory leak.
To Reproduce Steps to reproduce the behavior:
- Grab existing .NET Framework WebAPI service.
- Install nuget package Elastic.Apm.AspNetFullFramework 1.3.0
- Add to the web config modules section:
<add name="ElasticApmModule" type="Elastic.Apm.AspNetFullFramework.ElasticApmModule, Elastic.Apm.AspNetFullFramework" />
- Add to the web config app settings section:
<add key="ElasticApm:ServerUrls" value="http://apmhost:8200" /> <add key="ElasticApm:ServiceName" value="ServiceName" /> <add key="ElasticApm:ServiceVersion" value="2020.02.25.01" /> <add key="ElasticApm:Environment" value="Production" /> <add key="ElasticApm:TransactionSampleRate" value="0.25" />
- On controllers methods that have parameters in the path, on first line of body, update transaction name to have the parameters replaced to hard-coded text to cause grouping to work better.
- Deploy service to IIS 10 server 2016
- Send 10 - 20 requests per minute to service
Expected behavior Memory usage to be unchanged or bumped a bit higher, not unstopped growth of usage.
I have tried to recreate this issue in our dev environment and have even tried using 1000s of requests a minute to speed things up but have not been able to recreate. We are experiencing this on 2 of the 3 .net framework services we have in production. 2 .net core services in production don’t seem to be having the issue. We have since pulled back 1 of the 2 from production and just today tried the 2nd one in production to confirm the results.
The one deployed today only had the steps above as the single changeset.
No lag in traffic to the APM server is observed. All transactions seem to be flowing to the APM server quickly and hit rates or transactions showing up there are as expected. So nothing is jumping out at me like stuff is getting clogged. I thought of #387, but wouldn’t that only be applicable if events were clogged?
As of typing this issue out, the remaining prod service has a memory footprint of over 2gb. IIS Worker Pool Snapshot: The row highlighted represents the service we deployed today. The yellow highlight a few rows below represents the other .net framework service that has been out there for about a week now with no issues. The service does not share its app pool. Resetting the service or app pool immediately corrects / resets the memory issue.
Kibana server memory usage: The line drawn is when we promoted the problem service to production.
We first encountered this issue last Friday and rolled-back the service to investigate. We were tipped off by high memory usage on the server, CPU spiking, and timeouts occurring for clients trying to talk to that service. Logs in APM from that service at that time had some spans showing 36 seconds and higher even though other external data sources indicated the duration of those spans should have been much lower (under a second). Rolling back resolved the issue.
I’ll continue trying stuff out in our dev environment to try to recreate it there.
Please advise
Issue Analytics
- State:
- Created 4 years ago
- Comments:11 (5 by maintainers)
Top GitHub Comments
Hi,
I experience the same error using Elastic.Apm.AspNetFullFramework 1.12.1 in two .NET Framework applications.
These are my settings:
Application 1:
Application 2:
The cause seems to be the gc metrics. I can tell that by running a memory profiler and seeing the ETWTraceEventSource retaining a lot of memory. The memory profiler also reports an event handler leak for ElasticApmModule but doesn’t provide more details…
I will try the temporary fix (<add key="ElasticApm:DisableMetrics" value="clr.gc.*" />) but this is still an active issue that needs to be reinvestigated.
Here you can see how to memory increased over time. In the first segment we ran the applications with 1.0 transaction sample rate, and then reduced it to 0.5 (seen in the 2nd segment) thinking that it might help (it was before analyzing a memory dump). As you can see, reducing the transaction sample rate it helped a bit but there is still a memory leak.
Please let me know if I can help with more details in the investigation.
A few days later and still no issues with memory.