question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

option to emit raw string buffers instead of decoded strings

See original GitHub issue

I’m using version 2.6.0 FWIW, node 6.

± node debug bin.js foo.zip
< Debugger listening on [::]:5858
connecting to 127.0.0.1:5858 ... ok
break in bin.js:2
  1
> 2 'use strict'
  3 const extractExec = require('./')
  4 const fs = require('fs')
c
break in index.js:46
 44     // TODO: what if we get multiple plists?
 45     const plist = plists[0]
>46     debugger
 47     getExecStream(fd, plist.CFBundleExecutable, (err, entry, exec) => {
 48       debugger
c
break in index.js:19
 17     zip.on('entry', function onentry (entry) {
 18       if ((/XXXThing.*app\/XXXThing-.*/i).test(entry.fileName)) {
>19         debugger;
 20       }
 21       if (!isOurExec(entry, execname)) { return }
repl
Press Ctrl + C to leave debug repl
> entry.fileName
'Payload/XXXThing-╬▓.app/XXXThing-╬▓'
> execname
'XXXThing-β'

as you can see the execname is right but the entry.fileName is not right utf-8 AFAICT.

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
thejoshwolfecommented, Oct 11, 2016

Better proposal: add an option to open() that leaves all strings undecoded as Buffer objects instead of strings. Then you can use any kind of encoding guesser or assume UTF-8 as you wish. I think this is the right solution to this issue.

1reaction
thejoshwolfecommented, Oct 5, 2016

Interesting bug report. The behavior you’re seeing from Info-Zip is actually non-standard behavior. yauzl is behaving “correctly” with respect to the zipfile specification.

There are multiple ways for a zipfile to indicate that the filenames are encoded in utf-8, and your zipfile does none of them. According to the spec, if no charset is specified, then cp437 is to be used, which is what yauzl is doing.

I’m not sure why Info-Zip’s unzip is making an assumption about the filename being UTF-8. I’ve read the man page and even spent some time searching the source for the reason for that behavior. The closest I came is an excerpt from the zip man page, which may or may not be relevant:

Though the zip standard requires storing paths in an archive using a specific character set, in practice zips have stored paths in archives in whatever the local character set is.

So the question remains, what should yauzl do in this situation? Should the spec be considered correct, or should “in practice” behavior of popular tools be considered correct? It’s a tough call, but I’m leaning toward the spec.

If you’d like to fix your zipfile, try setting general purpose bit 11 in all the entries. That is what yazl does to indicate the filename is to be decoded using utf8. If you’re creating the zipfile at a higher level than that, then i suggest using a different library/utility for creating zipfiles, because the one you’re using is non-conformant. If you didn’t make the zipfile at all, but you got it from a user, then i suggest you forward this paragraph to your user.

I haven’t seen general purpose bit 11 mishandled like this in any existing zipfile utility i’ve tested this with. I can’t say for sure, but i believe i’ve tested this issue with Info-Zip’s zip, Windows Compressed Folder, Mac’s Archive Utility, and 7-Zip. I’m not as familiar with Java’s ZipFile class, python’s zipfile module, or WinRAR.

So I don’t know how this zipfile came to exist with the filename encoding messed up, but I really don’t think I should follow in Info-Zip’s nonstandard footsteps on this matter. Following the spec is one of yauzl’s design principles, and cp437 support is a feature.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Encoding | Protocol Buffers - Google Developers
A protocol buffer message is a series of key-value pairs. The binary version of a message just uses the field's number as the...
Read more >
Why to use StringBuffer in Java instead of the string ...
Someone told me it's more efficient to use StringBuffer to concatenate strings in Java than to use the + ...
Read more >
Buffer | Node.js v19.3.0 Documentation
Node.js buffers accept all case variations of encoding strings that they receive. ... When decoding a Buffer into a string that does not...
Read more >
io — Core tools for working with streams — Python 3.11.1 ...
It deals with buffering on a raw binary stream ( RawIOBase ). ... streams whose bytes represent text, and handles encoding and decoding...
Read more >
codecs – String encoding and decoding - PyMOTW
When unicode strings are output, they are encoded using one of several standard schemes so that the sequence of bytes can be reconstructed...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found