Manylinux wheel size could be reduced by 66%
See original GitHub issueWhen building numpy
from source and running the command strip
, the resulting folder is 66% lighter than when using the manylinux wheel (via pip
).
[…] the strip program removes inessential information from executable binary programs and object files, thus potentially resulting in better performance and sometimes significantly less disk space usage https://en.wikipedia.org/wiki/Strip_(Unix)
This is probably harmless on most systems, yet it is quite important in size-constrained systems (such as AWS Lambda).
Some developers have resorted to distributing their own stripped binaries (e.g. lambda packages for the serverless framework zappa), but it seems like a makeshift solution.
I think the problem should be solved upstream, as each library should be responsible for packaging their own optimized binaries.
System specifications
Every command below has been executed using the amazonlinux
docker image.
https://hub.docker.com/_/amazonlinux/
Numpy version: 1.14.2
Python version: 3.6.2
How to replicate
Prepare docker image
docker run -it amazonlinux bash
yum update -y
yum install -y findutils binutils python36-devel gcc
Install wheel & measure package size
python3 -m pip install -t wheel numpy==1.14.2
du -sh wheel
–> 57 MB
Try strip:
find wheel/ -name "*.so"|xargs strip
du -sh wheel
–> 56 MB
No real progress here. The binary wheel seems already stripped.
Build from source & measure package size
python3 -m pip install -t build --no-binary numpy numpy==1.14.2
du -sh build
–> 42 MB
Try strip:
find build/ -name "*.so"|xargs strip
du -sh build
–> 19 MB
Issue Analytics
- State:
- Created 5 years ago
- Reactions:3
- Comments:14 (8 by maintainers)
If you do that please call it
numpy-slow
, so people know what they’re getting into 😃Hello @pv, thanks for looking into this.
As per the documentation:
The problem here is that, considering
numpy
is often installed along other Python packages from the Python scientific community (e.g.pandas
,scipy
, etc.), they all add up to an impressive size.Now I know each library should address their own build problem separately, but since
numpy
is usually the main building block… maybe includingopenblas
isn’t what everyone needs.Isn’t there a way to offer an alternate “lightweight” installation ? Maybe something along the lines:
pip install numpy-light