Chinese filename decode
See original GitHub issueFile: 中文测试.zip
The zip file contains 中文测试.md
,when I pass decodeStrings: true
, the result is
when I pass decodeStrings: false
, the error The "path" argument must be of type string
be thrown.
Issue Analytics
- State:
- Created 5 years ago
- Comments:12 (6 by maintainers)
Top Results From Across the Web
Corrupted Chinese File Name with Un-ZIP
If you are curious about the default encodings used by macOS and Linux that generated corrupted Chinese file names from the ZIP archive,...
Read more >How to decode contents of a batch file with chinese characters
Yeah, so it's just ASCII text that is being misdetected as UTF-16. Tell your editor to load it as Windows-1252 (or ISO-8859-1, or...
Read more >python - Why did I get UnicodeDecodeError when I read a file ...
Why did I get UnicodeDecodeError when I read a file which contains Chinese characters? · How to Ask and minimal reproducible example ·...
Read more >Chinese characters SOMETIMES not decoded properly in ...
The first screenshot below shows an installer of a Chinese application. The second one shows Chinese characters displayed just fine in filenames ......
Read more >1.2 Chinese characters decoding · Python Learning Notes
Summary: Use open("filename", "encoding=xxx") when reading unicode data from a file. I stucked on this issue for couple hours, and read several blogs...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I did some research into Info-ZIP’s charset detection code, and in the absence of General Purpose Bit 11, Info-ZIP uses a different charset depending on the operating system. It will only use CP437 as required by the spec on some platforms, presumably DOS. However, on Linux and Mac, Info-ZIP will simply always use UTF-8 for decoding file paths, because UTF-8 is the “native” charset on those platforms, whatever that means. This suggests it’s safe for yauzl to drop support for CP437 and just use UTF-8 in all situations as well. 🤔
@imcuttle I have a need to handle similar not-so-standard .zip files in my application, and I wanted to share my heuristic solution.
If you only need to deal with this file and similar files that are always UTF-8 (even if they don’t indicate this), you can use the
decodeStrings: true
option and convert them to strings yourself. YourThe "path" argument must be of type string
error is likely coming from some other code downstream that is expecting it to be a string. You probably need to do the Buffer -> string conversion before this point.In my case, it is a bit more complicated, as I need to simultaneously handle zip files that are UTF-8 (with and without the proper bit being set), as well as files that are CP437 encoded. My solution is to use
decodeStrings: false
, collect all of theZipEntries
andfileName
Buffers, and then to inspect these name Buffers to try and guess the proper encoding.Specifically, I use the code in this gist to get some information on the name Buffers, followed by this logic:
This has been working well for the .zip files that I deal with.