question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItĀ collects links to all the places you might be looking at while hunting down a tough bug.

And, if youā€™re still stuck at the end, weā€™re happy to hop on a call to see how we can help out.

[Bug]: FileDescriptors not properly released on file rotate

See original GitHub issue

šŸ”Ž Search Terms

file, disk usage

The problem

Environment: Ubuntu 18.04 LTS

On running the winston file transport, we found that with the tailable option is set to true, on rotation of logs, files are renamed up to max files, but old file descriptors remain open by the node process.

The result is that if you calculate file space usage using a tool like ā€œduā€ the deleted files no longer show up, but the disk space is still ā€œusedā€ and will be reported as used by tool like ā€œdfā€. If the node application using the winston file transport in this way is left running long enough, eventually all file descriptors on the partition will be in use by the node process, which will result in the failure to create any new files due to no file descriptor availability.

What version of Winston presents the issue?

v3.2.1 and v3.6.0

What version of Node are you using?

v14.19.0

If this worked in a previous version of Winston, which was it?

unknown (but works in Node v12 and Node v10)

Minimum Working Example

This example will set up the file rotation such that you will quickly see the issue. You can play with the MAX_SIZE, MAX_FILES, and TIMEOUT_MS if the issue is happening too fast.

'use strict';

const MAX_SIZE = 2000;
const MAX_FILES = 10;
const TIMEOUT_MS = 1000;

const winston = require('winston');
const {format} = require('logform');
const path = require('path');
const {combine, timestamp, json} = format;

// Set up file transport - I assume logs folder exists
const logsDir = path.resolve(process.cwd(), './logs');
const filePath = path.join(logsDir, 'example.log');
const fileTransport = {
  tailable: true,
  maxsize: MAX_SIZE,
  maxFiles: MAX_FILES,
  filename: filePath,
  handleExceptions: true,
  level: 'debug',
};
fileTransport.format = combine(
  timestamp({format: 'YYYY-MM-DD HH:mm:ss.SSS'}),
  json(),
);

// Create logger with file transport
const logger = winston.createLogger({
  transports: [new winston.transports.File(fileTransport)],
  exitOnError: false,
});
logger.log('info', 'Created file logger');

// Add periodic logs so that files are rotated
let i = 0;
setTimeout(timeoutLogger, TIMEOUT_MS);
function timeoutLogger() {
  i++;
  logger.log('info', 'another log message ' + i);
  logger.log('error', 'error message ' + i);
  logger.log('http', 'http log message ' + i);
  logger.log('verbose', 'verbose log message ' + i);
  logger.log('debug', 'debug log message ' + i);
  setTimeout(timeoutLogger, TIMEOUT_MS);
}

While running this example, use a terminal to check the file descriptor status using:

  • lsof | grep example | grep deleted
  • ps aux | grep node (to find the node PID) and then ls -al /proc/${PID}/fd

Youā€™ll see ā€œdeletedā€ files associated with open file descriptors.

Additional information

As I mentioned above, this problem is occurring in Node 14 and we did NOT see it in Node 12 or Node 10. A check through the Node 14 documentation showed a change to the fd garbage collection that may be what is triggering this issue now: https://github.com/nodejs/node/pull/28396

I poked around the winston file transport code, and the problem seems to be tied to calling the _rotateFile() function multiple times (multiple logs triggering the logic that the max file size has been reached and the file needs to be rotated). That function then calls the _incFile() function multiple times. The asynchronous handling of the multiple calls seems to cause the file descriptor cruft.

Just to gain some proof, I modified the line lib/winston/transports/file.js:172 as follows, and that does fix the problem.

// Original line:
this._endStream(() => this._rotateFile());

// My change:
this._endStream(() => {
  if (this._rotate === true) {
    this._rotate = false;
    this._rotateFile();
  }
});

I donā€™t suppose that is a good and robust fix, but maybe it points to a potential solution. Iā€™m happy to try out potential solutions if the experts have thoughts on that. Please let me know if you need any other information or if there is anything I can do to help resolve this.

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:3
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
sn-tmcommented, Apr 11, 2022

Can confirm we are seeing the same issue, 3.2.1. Using lsof reports hundreds of file descriptors holding onto the *1.log and *9.log files. The issue only seems to happen to applications which log in rapid bursts (thatā€™s a separate issue which weā€™ll address!) I havenā€™t had a chance to troubleshoot yet.

0reactions
wbtcommented, Nov 15, 2022

I appreciate the efforts here to report and diagnose the issue - it does seem like something that should be fixed. #2100 is still marked as a draft and fails checks, and might not be a root cause fix. With no funding for maintainers on the project, a deep dive into finding the root cause will have to come from the community, unless a core maintainer gets blocked by the same issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

952929 ā€“ Bad file descriptor from ... - Red Hat Bugzilla
The bug is *probably* that ConcurrentRotatingFileHandler doesn't close its stream_lock in the close method, but adding some code to do thatĀ ...
Read more >
Caddy runs out of file descriptors - Help
Yesterday, I set up a Caddy server (latest via go get) and got some traffic. I used LimitNOFILE=8192 in my systemd service description....
Read more >
FileDescriptor.dup() to restore io, out, and err streams
Currently its impossible to restore the standard io streams if they have been closed. One needs the flexibility to temporarily close the standard...
Read more >
The production killer file descriptor | oded.dev
A few days ago one of our (Gartner Innovation Center) productions servers died as a result of a log file that wasn't properly...
Read more >
why nginx holds file descriptions of logs? - Unix Stack Exchange
postrotate kill -USR1 `cat /var/run/nginx.pid` &>/dev/null endscript. the USR1 signal tells nginx to reload the log files (thus releasing ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found