question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

unzip Russian files

See original GitHub issue

What can’t you do right now? It happens that in Russia file names inside zip files are often encoded with cp866. Such filenames currently decoded incorrectly in fflate. The best I can do is

  new TextDecoder('cp866').decode(strToU8(file.name))

but it produces correct characters interleaved with some gibberish.

An optimal solution Either provide the raw name in UnzipFile

{
    name: string,//as it is decoded now
    rawName: {
        bytes: Uint8Array,
        isUTF8: boolean
    },
    ondata: AsyncFlateStreamHandler,
    ...
}

, or make it possible to provide an encoding for entries marked as not utf-8.

unzip = new Unzip();
unzip.setFallbackEncoding('cp866');

(How) is this done by other libraries? jszip also fails to decode it correctly.

There is unzip -O cp866 in Ubuntu starting from some version, and before that version I believe they had a hack that would have used cp866 automatically if it had seen a Russian locale in the OS. A browser equivalent for that hack would be navigator.language == 'ru-RU' if you are willing to use that approach.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:1
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
mixturcommented, Dec 17, 2021

https://github.com/vlm/zip-fix-filename-encoding/blob/master/src/runzip.c this might help a little bit. They are trying to guess an encoding by character frequencies there.

Also there are some test files that might be useful https://github.com/Stuk/jszip/tree/master/test/ref In particular local_encoding_in_name.zip has russian filenames inside, I think it is encoded with cp866 according to jszip tests.

I was probably wrong about jszip in the first comment, apparently they are handling it somehow (or at least they have tests for that), yet for some of my files jszip produces something wrong in file names. And I definitely have cp866.

0reactions
mixturcommented, Jan 26, 2022

I think it is tempting to archivers authors to use one-byte encodings to save a few more bytes. So the problem is not going away any time soon.

But yeah. putting the problem on the user is fine by me too.

Read more comments on GitHub >

github_iconTop Results From Across the Web

how to extract zip file with Russian characters??? #307 - GitHub
Steps to reproduce i have a zip file with russian file name inside Expected behavior FastZip fz = new FastZip(); fz.
Read more >
How to unzip a file written in Russian
I am currently using Winrar to unzip charts downloaded from a site, but have found one that I really want to download but...
Read more >
How to decompress a ZIP file with specified file/directory name ...
First, extract the archive using bsdtar , since the unzip tool seems to mangle the file names, while bsdtar will extract them raw....
Read more >
15 Best FREE Unzip Programs | Zip File Opener To Unzip Files
Free unzip programs let you extract any number of files within a compressed file with extensions like ZIP, RAR, 7Z, etc. Compressed files...
Read more >
How can I correctly decompress a ZIP archive of files with ...
I managed to extract .zip file correctly with LANG=ru_RU.CP1251; unzip Bleed.zip (it was Cyrillic encoding in my case). Now I wonder how ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found