option to emit raw string buffers instead of decoded strings
See original GitHub issueI’m using version 2.6.0 FWIW, node 6.
± node debug bin.js foo.zip
< Debugger listening on [::]:5858
connecting to 127.0.0.1:5858 ... ok
break in bin.js:2
1
> 2 'use strict'
3 const extractExec = require('./')
4 const fs = require('fs')
c
break in index.js:46
44 // TODO: what if we get multiple plists?
45 const plist = plists[0]
>46 debugger
47 getExecStream(fd, plist.CFBundleExecutable, (err, entry, exec) => {
48 debugger
c
break in index.js:19
17 zip.on('entry', function onentry (entry) {
18 if ((/XXXThing.*app\/XXXThing-.*/i).test(entry.fileName)) {
>19 debugger;
20 }
21 if (!isOurExec(entry, execname)) { return }
repl
Press Ctrl + C to leave debug repl
> entry.fileName
'Payload/XXXThing-╬▓.app/XXXThing-╬▓'
> execname
'XXXThing-β'
as you can see the execname is right but the entry.fileName is not right utf-8 AFAICT.
Issue Analytics
- State:
- Created 7 years ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
Encoding | Protocol Buffers - Google Developers
A protocol buffer message is a series of key-value pairs. The binary version of a message just uses the field's number as the...
Read more >Why to use StringBuffer in Java instead of the string ...
Someone told me it's more efficient to use StringBuffer to concatenate strings in Java than to use the + ...
Read more >Buffer | Node.js v19.3.0 Documentation
Node.js buffers accept all case variations of encoding strings that they receive. ... When decoding a Buffer into a string that does not...
Read more >io — Core tools for working with streams — Python 3.11.1 ...
It deals with buffering on a raw binary stream ( RawIOBase ). ... streams whose bytes represent text, and handles encoding and decoding...
Read more >codecs – String encoding and decoding - PyMOTW
When unicode strings are output, they are encoded using one of several standard schemes so that the sequence of bytes can be reconstructed...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Better proposal: add an option to
open()
that leaves all strings undecoded as Buffer objects instead of strings. Then you can use any kind of encoding guesser or assume UTF-8 as you wish. I think this is the right solution to this issue.Interesting bug report. The behavior you’re seeing from Info-Zip is actually non-standard behavior. yauzl is behaving “correctly” with respect to the zipfile specification.
There are multiple ways for a zipfile to indicate that the filenames are encoded in utf-8, and your zipfile does none of them. According to the spec, if no charset is specified, then cp437 is to be used, which is what yauzl is doing.
I’m not sure why Info-Zip’s
unzip
is making an assumption about the filename being UTF-8. I’ve read the man page and even spent some time searching the source for the reason for that behavior. The closest I came is an excerpt from thezip
man page, which may or may not be relevant:So the question remains, what should yauzl do in this situation? Should the spec be considered correct, or should “in practice” behavior of popular tools be considered correct? It’s a tough call, but I’m leaning toward the spec.
If you’d like to fix your zipfile, try setting general purpose bit 11 in all the entries. That is what yazl does to indicate the filename is to be decoded using utf8. If you’re creating the zipfile at a higher level than that, then i suggest using a different library/utility for creating zipfiles, because the one you’re using is non-conformant. If you didn’t make the zipfile at all, but you got it from a user, then i suggest you forward this paragraph to your user.
I haven’t seen general purpose bit 11 mishandled like this in any existing zipfile utility i’ve tested this with. I can’t say for sure, but i believe i’ve tested this issue with Info-Zip’s
zip
, Windows Compressed Folder, Mac’s Archive Utility, and 7-Zip. I’m not as familiar with Java’s ZipFile class, python’s zipfile module, or WinRAR.So I don’t know how this zipfile came to exist with the filename encoding messed up, but I really don’t think I should follow in Info-Zip’s nonstandard footsteps on this matter. Following the spec is one of yauzl’s design principles, and cp437 support is a feature.