Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Switch to faster hashing for deflate

See original GitHub issue

I saw what was mentioned in #135 and #136 and decided to try out the performance benchmarks for myself.

Selected samples: (1 of 1)
 > lorem_1mb

Sample: lorem_1mb.txt (1000205 bytes raw / ~257018 bytes compressed)
 > deflate-fflate x 29.27 ops/sec ±1.97% (52 runs sampled)
 > deflate-fflate-string x 25.85 ops/sec ±2.58% (47 runs sampled)
 > deflate-imaya x 7.41 ops/sec ±2.72% (22 runs sampled)
 > deflate-pako x 16.39 ops/sec ±1.44% (44 runs sampled)
 > deflate-pako-string x 15.05 ops/sec ±0.50% (41 runs sampled)
 > deflate-pako-untyped x 11.53 ops/sec ±1.06% (31 runs sampled)
 > deflate-uzip x 29.20 ops/sec ±2.41% (51 runs sampled)
 > deflate-zlib x 25.09 ops/sec ±0.41% (45 runs sampled)
 > inflate-fflate x 252 ops/sec ±1.07% (87 runs sampled)
 > inflate-fflate-string x 141 ops/sec ±0.75% (78 runs sampled)
 > inflate-imaya x 159 ops/sec ±1.37% (83 runs sampled)
 > inflate-pako x 193 ops/sec ±0.32% (87 runs sampled)
 > inflate-pako-string x 59.54 ops/sec ±0.31% (61 runs sampled)
 > inflate-pako-untyped x 65.32 ops/sec ±1.22% (64 runs sampled)
 > inflate-uzip x 274 ops/sec ±0.73% (86 runs sampled)
 > inflate-zlib x 589 ops/sec ±0.29% (93 runs sampled)

If you would like to experiment with the patches I made, I forked the repo: 101arrowz/pako.

Note the addition of fflate and uzip. uzip was mentioned in the previous issues but is, as you mentioned, slightly unstable. It hangs on bad input rather than throwing an error, cannot be configured much, cannot be used asynchronously or in parallel (multiple calls to inflate in separate callbacks, for example, causes it to fail), and, worst of all, does not support streaming.

I created fflate to resolve these issues while implementing some of the popular features requested here (e.g. ES6 modules). In addition, fflate adds asynchronous support with Web and Node workers (offload to a separate thread entirely rather than just add to event loop) and ZIP support that is parallelized to easily beat out JSZip. The bundle size ends up much smaller than pako too because of the much simpler code. As you can see it is much faster than pako.

The purpose of this issue is not to boast about my library or to call pako a bad one. I believe pako can become much faster while still being more similar than fflate or uzip to the real zlib (i.e. the internal lib/ directory). The overhauled implementation in Node 12 as mentioned in #193 is the perfect chance to escape the trap of trying to match the C source nearly line-for-line and instead become a canonical but high-performance rework in JavaScript. For this reason, I suggest you try to optimize some of the slower parts of the library. I found an even better hashing function that produced very few collisions in my testing, and for larger files/image files it outperforms even Node.js zlib in C; if you’d like to discuss, please let me know.

Personally: thank you for your incredible contributions to the open source community. I use pako in multiple projects and greatly appreciate its stability, not present in libraries such as my own.

Issue Analytics

State:
Created 3 years ago
Comments:26 (15 by maintainers)

Top GitHub Comments

1reaction

puzrincommented, Nov 14, 2020

I found reason why inflate speed decrease - because compression ratio much worse (bigger inflate input => slower)

let HASH_ZLIB = (s, prev, data) => ((prev << s.hash_shift) ^ data) & s.hash_mask;

pako deflate level 6, memLevel 8 => 257018 bytes


let HASH_FAST2 = (s, prev, data) => crc16(prev, data) & s.hash_mask;

pako deflate level 6, memLevel 8 => 536479 bytes


let HASH_FAST = (s, prev, data) => ((prev << 8) + (prev >> 8) + (data << 4)) & s.hash_mask;

pako deflate level 6, memLevel 8 => 536897 bytes

Inrease of memLevel to 9 (for 64k hash table) does not help.

1reaction

101arrowzcommented, Nov 8, 2020

I’ve started to experiment with the codebase and will report back if I make progress. The hashing is not centralized, i.e. you set it in many different places, so I need to deep-dive into the code to tune its performance.

On another note, I saw you pushed some new commits for v2.0.0. If you’re going to be releasing a new version, I’d recommend you consider a new idea I came up with: auto-workerization.

Typical workerization techniques do not allow you to reuse existing synchronous code in an asynchronous fashion, but I developed a function that can take a function (along with all of its dependencies), generate an inline worker string, and cache it for higher performance. It’s been specifically optimized to work even after mangling, minification, and usage in any bundler. More importantly, you can reference other functions, classes, and static consants with this new method.

The main reason I suggest this is that very, very few people want to cause the environment (which could very well be a user’s browser) to hang while running a CPU intensive task on the main thread, such as deflate. Offering a method to offload this to a worker automatically is incredibly useful for many less-experienced developers who don’t understand workers and just want high performance and good UX.

You can see how I did it in fflate if you’d like to consider it.