question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Switch to faster hashing for deflate

See original GitHub issue

I saw what was mentioned in #135 and #136 and decided to try out the performance benchmarks for myself.

Selected samples: (1 of 1)
 > lorem_1mb

Sample: lorem_1mb.txt (1000205 bytes raw / ~257018 bytes compressed)
 > deflate-fflate x 29.27 ops/sec ±1.97% (52 runs sampled)
 > deflate-fflate-string x 25.85 ops/sec ±2.58% (47 runs sampled)
 > deflate-imaya x 7.41 ops/sec ±2.72% (22 runs sampled)
 > deflate-pako x 16.39 ops/sec ±1.44% (44 runs sampled)
 > deflate-pako-string x 15.05 ops/sec ±0.50% (41 runs sampled)
 > deflate-pako-untyped x 11.53 ops/sec ±1.06% (31 runs sampled)
 > deflate-uzip x 29.20 ops/sec ±2.41% (51 runs sampled)
 > deflate-zlib x 25.09 ops/sec ±0.41% (45 runs sampled)
 > inflate-fflate x 252 ops/sec ±1.07% (87 runs sampled)
 > inflate-fflate-string x 141 ops/sec ±0.75% (78 runs sampled)
 > inflate-imaya x 159 ops/sec ±1.37% (83 runs sampled)
 > inflate-pako x 193 ops/sec ±0.32% (87 runs sampled)
 > inflate-pako-string x 59.54 ops/sec ±0.31% (61 runs sampled)
 > inflate-pako-untyped x 65.32 ops/sec ±1.22% (64 runs sampled)
 > inflate-uzip x 274 ops/sec ±0.73% (86 runs sampled)
 > inflate-zlib x 589 ops/sec ±0.29% (93 runs sampled)

If you would like to experiment with the patches I made, I forked the repo: 101arrowz/pako.

Note the addition of fflate and uzip. uzip was mentioned in the previous issues but is, as you mentioned, slightly unstable. It hangs on bad input rather than throwing an error, cannot be configured much, cannot be used asynchronously or in parallel (multiple calls to inflate in separate callbacks, for example, causes it to fail), and, worst of all, does not support streaming.

I created fflate to resolve these issues while implementing some of the popular features requested here (e.g. ES6 modules). In addition, fflate adds asynchronous support with Web and Node workers (offload to a separate thread entirely rather than just add to event loop) and ZIP support that is parallelized to easily beat out JSZip. The bundle size ends up much smaller than pako too because of the much simpler code. As you can see it is much faster than pako.

The purpose of this issue is not to boast about my library or to call pako a bad one. I believe pako can become much faster while still being more similar than fflate or uzip to the real zlib (i.e. the internal lib/ directory). The overhauled implementation in Node 12 as mentioned in #193 is the perfect chance to escape the trap of trying to match the C source nearly line-for-line and instead become a canonical but high-performance rework in JavaScript. For this reason, I suggest you try to optimize some of the slower parts of the library. I found an even better hashing function that produced very few collisions in my testing, and for larger files/image files it outperforms even Node.js zlib in C; if you’d like to discuss, please let me know.

Personally: thank you for your incredible contributions to the open source community. I use pako in multiple projects and greatly appreciate its stability, not present in libraries such as my own.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:26 (15 by maintainers)

github_iconTop GitHub Comments

1reaction
puzrincommented, Nov 14, 2020

I found reason why inflate speed decrease - because compression ratio much worse (bigger inflate input => slower)

let HASH_ZLIB = (s, prev, data) => ((prev << s.hash_shift) ^ data) & s.hash_mask;

pako deflate level 6, memLevel 8 => 257018 bytes


let HASH_FAST2 = (s, prev, data) => crc16(prev, data) & s.hash_mask;

pako deflate level 6, memLevel 8 => 536479 bytes


let HASH_FAST = (s, prev, data) => ((prev << 8) + (prev >> 8) + (data << 4)) & s.hash_mask;

pako deflate level 6, memLevel 8 => 536897 bytes

Inrease of memLevel to 9 (for 64k hash table) does not help.

1reaction
101arrowzcommented, Nov 8, 2020

I’ve started to experiment with the codebase and will report back if I make progress. The hashing is not centralized, i.e. you set it in many different places, so I need to deep-dive into the code to tune its performance.

On another note, I saw you pushed some new commits for v2.0.0. If you’re going to be releasing a new version, I’d recommend you consider a new idea I came up with: auto-workerization.

Typical workerization techniques do not allow you to reuse existing synchronous code in an asynchronous fashion, but I developed a function that can take a function (along with all of its dependencies), generate an inline worker string, and cache it for higher performance. It’s been specifically optimized to work even after mangling, minification, and usage in any bundler. More importantly, you can reference other functions, classes, and static consants with this new method.

The main reason I suggest this is that very, very few people want to cause the environment (which could very well be a user’s browser) to hang while running a CPU intensive task on the main thread, such as deflate. Offering a method to offload this to a worker automatically is incredibly useful for many less-experienced developers who don’t understand workers and just want high performance and good UX.

You can see how I did it in fflate if you’d like to consider it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

-m (Set compression Method) switch - 7-Zip
Sets the number of fast bytes for the Deflate/Deflate64 encoder. It can be in the range from 3 to 258 (257 for Deflate64)....
Read more >
deflate.c - chromium/src/third_party/zlib - Git at Google
#error "FASTEST is not supported in Chromium's zlib." ... keep the hash table consistent if we switch back to level > 0 later....
Read more >
Acceleration of Deflate Encoding and Decoding with GPU ...
Our experimental results for Deflate encoding show that our GPU implementation is up to 1.28x times faster than pigz running on four Intel...
Read more >
Deflate - Wikipedia
In computing, Deflate (stylized as DEFLATE) is a lossless data compression file format that uses a combination of LZ77 and Huffman coding.
Read more >
deflate.c - OptiPNG
However the F&G algorithm may be faster for some highly redundant * files if ... 0 to keep the hash table consistent if...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found