question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Memory leak in version > 3.20.0

See original GitHub issue

Describe the bug

We have upgraded to 3.24.0 all our team services (10) and all of them started running out of memory.

All of them are running in K8S. It would take between 30 min to 3 hours of a pod run out of memory and restart.

We downgraded to 3.20.0 and the issue disappeared.

We were not seeing issue in a staging environment so I believe the issue can be noticed under higher loads.

To Reproduce

Steps to reproduce the behavior: Run http service with high load and observe that memory slowly increases until eventually node process dies with “JavaScript heap out of memory” error.

Expected behavior

Do not run our of memory like version 3.20.0

Environment (please complete the following information)

  • OS: [Linux]
  • Node.js version: 12.22.1
  • APM Server version: 7.15.2
  • Agent version: 3.24.0

How are you starting the agent? (please tick one of the boxes)

  • Calling agent.start() directly (e.g. require('elastic-apm-node').start(...))
  • Requiring elastic-apm-node/start from within the source code
  • Starting node with -r elastic-apm-node/start

Additional context

  • Agent config options

    Click to expand
    replace this line with your agent config options
    
  • package.json dependencies:

    Click to expand
    replace this line with your dependencies section from package.json
    

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:5
  • Comments:21 (11 by maintainers)

github_iconTop GitHub Comments

5reactions
Tirkecommented, Jan 31, 2022

Hello, deployed with asyncHooks: false and elastic-apm-node: 3.25.0 this morning.

Screenshot 2021-12-16 at 11 24 32

Red line is the rolling update of our services. Very stable average memory usage since the change. The two usuals suspects (red and Earls Green lines) are no longer rapidly increasing their memory usage.

4reactions
trentmcommented, Dec 21, 2021

Some notes and requests for help for those following this (I see a number of people ❤️ 'd @Tirke’s helpful comment showing that the asyncHooks: false workaround helps):

  • Timing: I was able to start spending real time on this yesterday, but today is my last work day before the end of year holidays so there will be a delay on updates.
  • Ruling out versions v3.22.0 or v3.21.0 of the Node.js APM agent: I strongly suspect the issue is due to a change introduced in v3.24.0 of the APM agent. However, I’ve had one (private) possible note that someone has seen increased memory usage by the agent in v3.22.0. If anyone that is seeing this issue is able to try v3.22.0 and/or v3.21.0 of the agent and confirm whether or not that “fixes” the increased memory usage, that would really help.

Some details on my current understanding/guesses

Internal details that you can skip if not interested: Version 3.24.0 of the Node.js APM agent introduced a significant change in how the agent tracks async tasks. Part of the change involves the agent having a “run context” object for each async task – the things that are put on the Node.js event loop. Those run context objects have a reference to the transaction and spans that a given async task is part of (e.g. the run context for an HTTP request handler has a reference to the APM transaction that collects the trace data for that HTTP request/response).

When a transaction ends (e.g. the HTTP response has been sent), the transaction object stays in memory until all the associated async tasks complete. Two things can happen in an application here that will both result in increased memory usage:

  1. If some of those associated async tasks complete, but only complete long after the response has been sent, then the APM transaction object will stay in memory for that “long” time after the response. For example doSomeTask in the following lives for 10s after the HTTP response has ended:
const apm = require('elastic-apm-node').start()
const express = require('express')
const app = express()
const port = 3000

// Some async task that takes 10s to complete in the background.
async function doSomeTask () {
  return new Promise((resolve) => { 
    setTimeout(resolve, 10000) 
  })
}

app.get('/ping', (req, res) => {
  doSomeTask()
  res.send('pong')
})

app.listen(port, () => {
  console.log(`listening at http://localhost:${port}`)
})

If the application is under constant load, then this results in higher memory usage. The APM agent exacerbates a situation like this because the APM transaction objects can be large (for large requests) compared to a small Promise.

  1. If there is a leak of async tasks in the application (e.g. leaked Promises) then, with the APM agent, those can keep APM transactions alive in memory. By themselves Promises are very small, and an application could leak many of them without noticing much increased memory usage. However, with indirectly attached APM transactions, the memory usage will be made much worse. One way to see if your application has a growing number of Promises in memory is by including this snippet of code (and the APM agent can be disabled to exclude it):
const promises = new Set()
require('async_hooks').createHook({
  init: (asyncId, type) => {
    if (type === 'PROMISE') {
      promises.add(asyncId)
    }
  },
  destroy: (asyncId) => {
    promises.delete(asyncId)
  }
}).enable()
setInterval(() => {
  console.log(`Promises in memory: ${promises.size}`)
}, 5000)
Read more comments on GitHub >

github_iconTop Results From Across the Web

Memory leak introduced when upgrading from Express 3.4 to ...
Rolling back to 3.4, I have confirmed that memory usage stays level in the mid-200s. Changing express versions is the only modification I ......
Read more >
Memory leak when scanning and deleting pages
1. Scan a few pages · 2. Note the memory useage · 3. Delete those pages (right click delete) · 4. Note the...
Read more >
Help with GtkBuilder memory leak - GNOME Mail Services
I'm trying to identify some memory leaks of my application using -fsanitize=address of GCC. But really I came to a point where all...
Read more >
Newest 'memory-leak' Questions - Ask Ubuntu
Is there a utility to monitor memory for processes over time? This question has been asked in different ways over time. Specifically, given...
Read more >
Valgrind Home
Automatically detect memory management and threading bugs, and perform detailed profiling. The current stable version is valgrind-3.20.0.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found