Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Memory leak in version > 3.20.0

See original GitHub issue

Describe the bug

We have upgraded to 3.24.0 all our team services (10) and all of them started running out of memory.

All of them are running in K8S. It would take between 30 min to 3 hours of a pod run out of memory and restart.

We downgraded to 3.20.0 and the issue disappeared.

We were not seeing issue in a staging environment so I believe the issue can be noticed under higher loads.

To Reproduce

Steps to reproduce the behavior: Run http service with high load and observe that memory slowly increases until eventually node process dies with “JavaScript heap out of memory” error.

Expected behavior

Do not run our of memory like version 3.20.0

Environment (please complete the following information)

OS: [Linux]
Node.js version: 12.22.1
APM Server version: 7.15.2
Agent version: 3.24.0

How are you starting the agent? (please tick one of the boxes)

Calling agent.start() directly (e.g. require('elastic-apm-node').start(...))
Requiring elastic-apm-node/start from within the source code
Starting node with -r elastic-apm-node/start

Additional context

Agent config options

Click to expand

replace this line with your agent config options

package.json dependencies:

Click to expand

replace this line with your dependencies section from package.json

Issue Analytics

State:
Created 2 years ago
Reactions:5
Comments:21 (11 by maintainers)

Top GitHub Comments

5reactions

Tirkecommented, Jan 31, 2022

Hello, deployed with asyncHooks: false and elastic-apm-node: 3.25.0 this morning.

Screenshot 2021-12-16 at 11 24 32

Red line is the rolling update of our services. Very stable average memory usage since the change. The two usuals suspects (red and Earls Green lines) are no longer rapidly increasing their memory usage.

4reactions

trentmcommented, Dec 21, 2021

Some notes and requests for help for those following this (I see a number of people ❤️ 'd @Tirke’s helpful comment showing that the asyncHooks: false workaround helps):

Timing: I was able to start spending real time on this yesterday, but today is my last work day before the end of year holidays so there will be a delay on updates.
Ruling out versions v3.22.0 or v3.21.0 of the Node.js APM agent: I strongly suspect the issue is due to a change introduced in v3.24.0 of the APM agent. However, I’ve had one (private) possible note that someone has seen increased memory usage by the agent in v3.22.0. If anyone that is seeing this issue is able to try v3.22.0 and/or v3.21.0 of the agent and confirm whether or not that “fixes” the increased memory usage, that would really help.

Some details on my current understanding/guesses

Internal details that you can skip if not interested: Version 3.24.0 of the Node.js APM agent introduced a significant change in how the agent tracks async tasks. Part of the change involves the agent having a “run context” object for each async task – the things that are put on the Node.js event loop. Those run context objects have a reference to the transaction and spans that a given async task is part of (e.g. the run context for an HTTP request handler has a reference to the APM transaction that collects the trace data for that HTTP request/response).

When a transaction ends (e.g. the HTTP response has been sent), the transaction object stays in memory until all the associated async tasks complete. Two things can happen in an application here that will both result in increased memory usage:

If some of those associated async tasks complete, but only complete long after the response has been sent, then the APM transaction object will stay in memory for that “long” time after the response. For example doSomeTask in the following lives for 10s after the HTTP response has ended:

const apm = require('elastic-apm-node').start()
const express = require('express')
const app = express()
const port = 3000

// Some async task that takes 10s to complete in the background.
async function doSomeTask () {
  return new Promise((resolve) => { 
    setTimeout(resolve, 10000) 
  })
}

app.get('/ping', (req, res) => {
  doSomeTask()
  res.send('pong')
})

app.listen(port, () => {
  console.log(`listening at http://localhost:${port}`)
})

If the application is under constant load, then this results in higher memory usage. The APM agent exacerbates a situation like this because the APM transaction objects can be large (for large requests) compared to a small Promise.

If there is a leak of async tasks in the application (e.g. leaked Promises) then, with the APM agent, those can keep APM transactions alive in memory. By themselves Promises are very small, and an application could leak many of them without noticing much increased memory usage. However, with indirectly attached APM transactions, the memory usage will be made much worse. One way to see if your application has a growing number of Promises in memory is by including this snippet of code (and the APM agent can be disabled to exclude it):

const promises = new Set()
require('async_hooks').createHook({
  init: (asyncId, type) => {
    if (type === 'PROMISE') {
      promises.add(asyncId)
    }
  },
  destroy: (asyncId) => {
    promises.delete(asyncId)
  }
}).enable()
setInterval(() => {
  console.log(`Promises in memory: ${promises.size}`)
}, 5000)