Memory leak in version > 3.20.0
See original GitHub issueDescribe the bug
We have upgraded to 3.24.0 all our team services (10) and all of them started running out of memory.
All of them are running in K8S. It would take between 30 min to 3 hours of a pod run out of memory and restart.
We downgraded to 3.20.0 and the issue disappeared.
We were not seeing issue in a staging environment so I believe the issue can be noticed under higher loads.
To Reproduce
Steps to reproduce the behavior: Run http service with high load and observe that memory slowly increases until eventually node process dies with “JavaScript heap out of memory” error.
Expected behavior
Do not run our of memory like version 3.20.0
Environment (please complete the following information)
- OS: [Linux]
- Node.js version: 12.22.1
- APM Server version: 7.15.2
- Agent version: 3.24.0
How are you starting the agent? (please tick one of the boxes)
- Calling
agent.start()
directly (e.g.require('elastic-apm-node').start(...)
) - Requiring
elastic-apm-node/start
from within the source code - Starting node with
-r elastic-apm-node/start
Additional context
-
Agent config options
Click to expand
replace this line with your agent config options
-
package.json
dependencies:Click to expand
replace this line with your dependencies section from package.json
Issue Analytics
- State:
- Created 2 years ago
- Reactions:5
- Comments:21 (11 by maintainers)
Top GitHub Comments
Hello, deployed with
asyncHooks: false
andelastic-apm-node: 3.25.0
this morning.Red line is the rolling update of our services. Very stable average memory usage since the change. The two usuals suspects (red and Earls Green lines) are no longer rapidly increasing their memory usage.
Some notes and requests for help for those following this (I see a number of people ❤️ 'd @Tirke’s helpful comment showing that the
asyncHooks: false
workaround helps):Some details on my current understanding/guesses
Internal details that you can skip if not interested: Version 3.24.0 of the Node.js APM agent introduced a significant change in how the agent tracks async tasks. Part of the change involves the agent having a “run context” object for each async task – the things that are put on the Node.js event loop. Those run context objects have a reference to the transaction and spans that a given async task is part of (e.g. the run context for an HTTP request handler has a reference to the APM transaction that collects the trace data for that HTTP request/response).
When a transaction ends (e.g. the HTTP response has been sent), the transaction object stays in memory until all the associated async tasks complete. Two things can happen in an application here that will both result in increased memory usage:
doSomeTask
in the following lives for 10s after the HTTP response has ended:If the application is under constant load, then this results in higher memory usage. The APM agent exacerbates a situation like this because the APM transaction objects can be large (for large requests) compared to a small Promise.