Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

option to emit raw string buffers instead of decoded strings

See original GitHub issue

I’m using version 2.6.0 FWIW, node 6.

± node debug bin.js foo.zip
< Debugger listening on [::]:5858
connecting to 127.0.0.1:5858 ... ok
break in bin.js:2
  1
> 2 'use strict'
  3 const extractExec = require('./')
  4 const fs = require('fs')
c
break in index.js:46
 44     // TODO: what if we get multiple plists?
 45     const plist = plists[0]
>46     debugger
 47     getExecStream(fd, plist.CFBundleExecutable, (err, entry, exec) => {
 48       debugger
c
break in index.js:19
 17     zip.on('entry', function onentry (entry) {
 18       if ((/XXXThing.*app\/XXXThing-.*/i).test(entry.fileName)) {
>19         debugger;
 20       }
 21       if (!isOurExec(entry, execname)) { return }
repl
Press Ctrl + C to leave debug repl
> entry.fileName
'Payload/XXXThing-╬▓.app/XXXThing-╬▓'
> execname
'XXXThing-β'

as you can see the execname is right but the entry.fileName is not right utf-8 AFAICT.

Issue Analytics

State:
Created 7 years ago
Comments:8 (4 by maintainers)

Top GitHub Comments

1reaction

thejoshwolfecommented, Oct 11, 2016

Better proposal: add an option to open() that leaves all strings undecoded as Buffer objects instead of strings. Then you can use any kind of encoding guesser or assume UTF-8 as you wish. I think this is the right solution to this issue.

1reaction

thejoshwolfecommented, Oct 5, 2016

Interesting bug report. The behavior you’re seeing from Info-Zip is actually non-standard behavior. yauzl is behaving “correctly” with respect to the zipfile specification.

There are multiple ways for a zipfile to indicate that the filenames are encoded in utf-8, and your zipfile does none of them. According to the spec, if no charset is specified, then cp437 is to be used, which is what yauzl is doing.

I’m not sure why Info-Zip’s unzip is making an assumption about the filename being UTF-8. I’ve read the man page and even spent some time searching the source for the reason for that behavior. The closest I came is an excerpt from the zip man page, which may or may not be relevant:

Though the zip standard requires storing paths in an archive using a specific character set, in practice zips have stored paths in archives in whatever the local character set is.

So the question remains, what should yauzl do in this situation? Should the spec be considered correct, or should “in practice” behavior of popular tools be considered correct? It’s a tough call, but I’m leaning toward the spec.

If you’d like to fix your zipfile, try setting general purpose bit 11 in all the entries. That is what yazl does to indicate the filename is to be decoded using utf8. If you’re creating the zipfile at a higher level than that, then i suggest using a different library/utility for creating zipfiles, because the one you’re using is non-conformant. If you didn’t make the zipfile at all, but you got it from a user, then i suggest you forward this paragraph to your user.

I haven’t seen general purpose bit 11 mishandled like this in any existing zipfile utility i’ve tested this with. I can’t say for sure, but i believe i’ve tested this issue with Info-Zip’s zip, Windows Compressed Folder, Mac’s Archive Utility, and 7-Zip. I’m not as familiar with Java’s ZipFile class, python’s zipfile module, or WinRAR.

So I don’t know how this zipfile came to exist with the filename encoding messed up, but I really don’t think I should follow in Info-Zip’s nonstandard footsteps on this matter. Following the spec is one of yauzl’s design principles, and cp437 support is a feature.