Speeding up generating compressed files
See original GitHub issueIn a project I work on, we use both CompressedStaticFilesMixin
and the standalone compressor (python -m whitenoise.compress <DIR>
) during Heroku deployments.
At the moment these steps are a considerable percentage (30-40%) of our deployment times.
For example using Python 2.7.13, Django 1.11.5, WhiteNoise master, Brotli 0.6.0, a Heroku-16 one-off performance-m dyno (2 cores, 2.5GB RAM, Ubuntu 16.04) with the static files directory cleared (to emulate deployment, since state intentionally isn’t carried over):
~ $ time ./manage.py collectstatic --noinput
...
156 static files copied to '/app/treeherder/static', 202 post-processed.
real 0m29.837s
user 0m29.405s
sys 0m0.359s
As a baseline, using the stock Django ManifestStaticFilesStorage
results in:
real 0m1.031s
user 0m0.855s
sys 0m0.167s
For the above, the 202 files output from ManifestStaticFilesStorage
have a combined file-size of 15MB.
Moving onto the standalone compressor (which we use on the output of a webpack build, for the SPA part of the project):
~ $ find dist/ -type f | wc -l
35
~ $ du -hs dist/
5.2M dist/
~ $ time python -m whitenoise.compress dist/
...
real 0m11.929s
user 0m11.841s
sys 0m0.084s
Ideas off the top of my head to speed this up:
- Use
concurrent.futures
or similar to take advantage of all cores - See if the
scantree()
implementation might be faster than compress.py’sos.walk()
plus later stats - Reduce the number of files being compressed (eg
WHITENOISE_KEEP_ONLY_HASHED_FILES
and #147) - Profile both
CompressedStaticFilesMixin
and the CLI version, to double check that most of the time is indeed being spent in the compiled gzip/brotli code and not somewhere unexpected. - Compare the performance of the gzip stdlib and compiled brotli python package with command line equivalents.
Issue Analytics
- State:
- Created 6 years ago
- Reactions:1
- Comments:5 (4 by maintainers)
Breakdown of
python -m whitenoise.compress dist/
times:--no-brotli
): 0.35s--no-gzip
): 11.66s--no-gzip --no-brotli
: 0.05s (this walks filesystem and reads files from disk but no compression/writes)So this is all on Brotli, and not due to the filesystem walking/reading parts or gzip (albeit the standalone compressor example here was just for 35 files; but even for a 10,000 file directory Brotli compression times would dwarf anything else even if the filesystem walking happened to be inefficient).
Really, the compression level should be configurable.
https://github.com/evansd/whitenoise/blob/master/whitenoise/compress.py#L84