question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Speeding up generating compressed files

See original GitHub issue

In a project I work on, we use both CompressedStaticFilesMixin and the standalone compressor (python -m whitenoise.compress <DIR>) during Heroku deployments.

At the moment these steps are a considerable percentage (30-40%) of our deployment times.

For example using Python 2.7.13, Django 1.11.5, WhiteNoise master, Brotli 0.6.0, a Heroku-16 one-off performance-m dyno (2 cores, 2.5GB RAM, Ubuntu 16.04) with the static files directory cleared (to emulate deployment, since state intentionally isn’t carried over):

~ $ time ./manage.py collectstatic --noinput
...
156 static files copied to '/app/treeherder/static', 202 post-processed.

real    0m29.837s
user    0m29.405s
sys     0m0.359s

As a baseline, using the stock Django ManifestStaticFilesStorage results in:

real    0m1.031s
user    0m0.855s
sys     0m0.167s

For the above, the 202 files output from ManifestStaticFilesStorage have a combined file-size of 15MB.

Moving onto the standalone compressor (which we use on the output of a webpack build, for the SPA part of the project):

~ $ find dist/ -type f | wc -l
35
~ $ du -hs dist/
5.2M    dist/
~ $ time python -m whitenoise.compress dist/
...
real    0m11.929s
user    0m11.841s
sys     0m0.084s

Ideas off the top of my head to speed this up:

  1. Use concurrent.futures or similar to take advantage of all cores
  2. See if the scantree() implementation might be faster than compress.py’s os.walk() plus later stats
  3. Reduce the number of files being compressed (eg WHITENOISE_KEEP_ONLY_HASHED_FILES and #147)
  4. Profile both CompressedStaticFilesMixin and the CLI version, to double check that most of the time is indeed being spent in the compiled gzip/brotli code and not somewhere unexpected.
  5. Compare the performance of the gzip stdlib and compiled brotli python package with command line equivalents.

Issue Analytics

  • State:open
  • Created 6 years ago
  • Reactions:1
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
edmorleycommented, Sep 15, 2017

Moving onto the standalone compressor (which we use on the output of a webpack build, for the SPA part of the project):

Breakdown of python -m whitenoise.compress dist/ times:

  • Both gzip and brotli: 11.93s
  • Just gzip (via: --no-brotli): 0.35s
  • Just Brotli (via: --no-gzip): 11.66s
  • --no-gzip --no-brotli: 0.05s (this walks filesystem and reads files from disk but no compression/writes)

So this is all on Brotli, and not due to the filesystem walking/reading parts or gzip (albeit the standalone compressor example here was just for 35 files; but even for a 10,000 file directory Brotli compression times would dwarf anything else even if the filesystem walking happened to be inefficient).

0reactions
sonthonaxrkcommented, Feb 16, 2021

Really, the compression level should be configurable.

https://github.com/evansd/whitenoise/blob/master/whitenoise/compress.py#L84

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to make 7-Zip faster - Super User
Not true at all. You can massively speed up 7-zip by simply changing it off the default settings. In fact the default settings...
Read more >
Fast compression: faster than fastest ZIP comparative - PeaZip
Faster than zip compression speed results​​ 7Z Zstandard and 7Z Brotli at normal compression level are approximatively 3x times faster than ZIP Deflate...
Read more >
How to Improve FileOptimizer Speed - OrganicWeb
Follow these instructions to speed up image compression using free FileOptimizer software on Windows computers.
Read more >
Compress a large number of large files fast
By far the fastest and most effective way of compressing data is to generate less of it. ... What kinds of logs are...
Read more >
Speed up image file transfer by zipping
It can be seen that as the number of files increases, the difference between using a zip file and transferring files individually increases....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found