question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Problems when dealing with invalidly-encoded filenames

See original GitHub issue
  • Operating System: Debian 9
  • Node.js version: 8.9.3
  • fs-extra version: 5.0.0

Hi there. I ran into some cases where remove() was unable to remove a directory due to filename encoding issues. I believe there are similar issues using empty, copy, and move operations (and their sync counterparts - basically anything that relies on fs.readdir / fs.readdirSync).

My issue arose when trying to fs.remove() some directories that were created from an unzip operation. During removes / rimraf’s tree walk, some of the returned directories seemed not to exist (although they did), causing the final unlink operation to fail (since it wasn’t actually successfully emptied).

It seems that, in general, names on a file system are just byte sequences, which are not guaranteed to represent fully valid strings. This causes the bytes-> string -> bytes operation, that happens when listing and then operating on items in a directory using Node, to not always produce the same file name that it read.

This encoding problem has been a known Node issue for a while, which is why an option was added to return Buffers from fs.readdir. My suggestion is to update the affected methods to use this Buffer option. I’m happy to work on a PR, but I wanted to at least get some feedback and discuss the issue before diving in.

Here are a couple Node issues relating to the file name encoding problem:

https://github.com/nodejs/node-v0.x-archive/issues/2387 https://github.com/nodejs/node/pull/5616

Thanks!

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:5

github_iconTop GitHub Comments

2reactions
bcoecommented, Aug 26, 2021

@rossj @RyanZim, bringing this issue back up, because we face the same problem with fs.cp() in Node.js.

I’ve been working on a port of Node.js’ path methods that work on Buffers:

https://github.com/bcoe/path-buffer

I’ve made an effort to detect utf8 vs., utf16, so that the appropriate separator is added or removed by methods like join and dirname, but I’m not an expert at string encodings, so it would be good to have someone who’s bumped into the issue confirm the logic is sound.

1reaction
rossjcommented, May 3, 2018

Ah, I was thinking of not filtering non-UTF8 names and just sending whatever string we get from the UTF8 conversion to the filter function. I’m pretty sure that Buffer.toString() will insert U+FFFD � for invalid UTF-8 sequences instead of failing. Continuing to send these potentially-incorrect strings to the filter function is no worse than the current situation, and it allows for string-based filtering of all files (regardless of if they are UTF8 or not) if the user only cares about ASCII, e.g. return src.indexOf('thing') >= 0.

Read more comments on GitHub >

github_iconTop Results From Across the Web

unable to deal with file names that have invalid encoding #8619
There is a file in my user's file system and Node.js cannot use any of the fs API on it, because the file...
Read more >
File name has invalid encoding and CRLF issues
To remove the “(invalid encoding)” you use the “convmv” tool. It is a tool that will convert the character encoding used in the...
Read more >
How to bulk-rename files with invalid encoding or bulk-replace ...
You're going to run in some problems if you want to rename files and directories at the same time. Renaming just a file...
Read more >
Characters to Avoid in Filenames and Directories
Illegal Filename Characters​​ Keep your filenames to a reasonable length and be sure they are under 31 characters. Most operating systems are case...
Read more >
fixing files with "invalid encoding" | The FreeBSD Forums
Which program did you use to unpack them? I suspect that the problem is that the files have names containing non-ASCII characters, which...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found