Memory leak
See original GitHub issueHi!
I currently have an implementation of prom client running that seems to continuously grow the node heap until an eventual crash/service restart when collecting API metrics. Currently, I am collecting the default metrics and a couple custom ones that just count websocket connections. These all work fine. The problems arise from the API metrics I try to collect using a middleware that sits on our Hapi routes. This middleware listens for ‘onRequest’ and then ‘response’ for collecting a response time in milliseconds, while adding labels for method, path, statusCode, and usernames.
Here is what the middleware looks like (I do not believe the problem is with this):
/**
* This is the middleware that looks at each request as it comes through the server
* @param {Object} server This is the instantiated Hapi server that we wish to listen on.
* @param {Object} options This is an object that can contain additional data.
* @param {Function} next This is the callback function that allows additional requests to continue.
* @return {Function} Callback function.
*/
const middlewareWrapper = (server, options, next) => {
server.ext('onRequest', (request, reply) => {
request.startTime = process.hrtime()
return reply.continue()
})
server.on('response', (request) => {
logic.apiPerformance(request.response.statusCode, request.startTime, sanitizeRequestData(request))
})
return next()
}
The metrics themselves are used inside of that apiPerformance function which looks like this:
/**
* This function takes various information about a request, performs some basic calculations, and then sends that off to loggly.
* @param {Integer} statusCode This is the HTTP statusCode that will be returned to the user with the response.
* @param {Array} startTime The difference in time from the request recieved to the response given. https://nodejs.org/api/process.html#process_process_hrtime_time
* @param {Object} responseData An object with sanitized response data.
*/
apiPerformance: (statusCode, startTime, responseData) => {
const diff = process.hrtime(startTime)
const responseTime = ((diff[0] * 1e3) + (diff[1] * 1e-6))
const time = new Date()
logger.transports.loggly.log('info', 'Route Log', {
responseCode: statusCode,
route: responseData.route,
host: responseData.host,
port: responseData.port,
method: responseData.method,
userData: responseData.userData,
responseTime: `${responseTime}ms`,
params: responseData.params,
timestamp: time.toISOString(),
tags: ['events', 'api', 'performance']
}, (error, response) => {
if (error) console.log(error)
})
if (config.metrics.prometheus.enabled) {
requestDurationSummary.labels(responseData.method, responseData.route, statusCode, responseData.userData.displayName).observe(responseTime)
requestBucketsHistogram.labels(responseData.method, responseData.route, statusCode, responseData.userData.displayName).observe(responseTime)
}
}
The metrics are registered at the top of that file like so:
const requestDurationSummary = prometheus.createSummary('http_request_duration_milliseconds', 'request duration in milliseconds', ['method', 'path', 'status', 'user'])
const requestBucketsHistogram = prometheus.createHistogram('http_request_buckets_milliseconds', 'request duration buckets in milliseconds.', ['method', 'path', 'status', 'user'], [ 500, 2000 ])
Node: v6.1.0 Hapi: v16.4.3 prom-client: v9.1.1
We are including prom-client as a part of a custom package (its basically a common libs package) that we include with our projects. Inside of this package, we include prom-client and expose only the methods we use (collectDefaultMetrics, registerMetrics, createGauge, createSummary, createHistogram). Another thing to note is that all metrics except http_request_duration_milliseconds and http_request_buckets_milliseconds (the two metrics posted above) are registered inside of the application we are tracking metrics for. Those two are registered in the code I posted above, which is within the custom package itself.
Any other information I can provide please let me know. Any help or suggestions are greatly appreciated. Thanks!
Issue Analytics
- State:
- Created 6 years ago
- Comments:29 (7 by maintainers)
For what it’s worth, I can confirm that 12.0.0 works infinitely better than 11.5.3, which either slowly grew in memory over days, or in some cases OOM’d a minute after start. Thanks for fixing this! 🙇♀️
There is a PR open that might fix it, but I’m waiting for clarifications in it. If you are in an experimental mood you could try that branch out and see if it solves your problem.