spacy installation extremely slow in docker
See original GitHub issueHow to reproduce the problem
Details: When installing spacy library inside docker, the installation process is extremely slow because spacy taking a lot of time to build the wheel. It takes more than 50+ minutes to run and build this spacy library. Why is a library with just 5.8MB taking this much time?
Collecting spacy==2.2.0
Downloading spacy-2.2.0.tar.gz (5.8 MB)
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'done'
Installing backend dependencies: started
Installing backend dependencies: still running...
Installing backend dependencies: still running...
Installing backend dependencies: still running...
Installing backend dependencies: still running...
Installing backend dependencies: still running...
Installing backend dependencies: still running...
Installing backend dependencies: still running...
Installing backend dependencies: finished with status 'done'
Preparing wheel metadata: started
Preparing wheel metadata: finished with status 'done'
Collecting SQLAlchemy==1.3.12
Downloading SQLAlchemy-1.3.12.tar.gz (6.0 MB)
Collecting testfixtures==6.10.3
Downloading testfixtures-6.10.3-py2.py3-none-any.whl (86 kB)
Collecting toml==0.10.0
Downloading toml-0.10.0-py2.py3-none-any.whl (25 kB)
Collecting twython==3.8.2
Downloading twython-3.8.2-py3-none-any.whl (33 kB)
Collecting typed-ast==1.4.0
Downloading typed_ast-1.4.0.tar.gz (206 kB)
Collecting urllib3==1.25.7
Downloading urllib3-1.25.7-py2.py3-none-any.whl (125 kB)
Collecting wcwidth==0.1.8
Downloading wcwidth-0.1.8-py2.py3-none-any.whl (17 kB)
Collecting Werkzeug==0.16.0
Downloading Werkzeug-0.16.0-py2.py3-none-any.whl (327 kB)
Collecting wrapt==1.11.2
Downloading wrapt-1.11.2.tar.gz (27 kB)
Collecting zipp==0.6.0
Downloading zipp-0.6.0-py2.py3-none-any.whl (4.1 kB)
Requirement already satisfied: setuptools in /usr/local/lib/python3.7/site-packages (from flake8-blind-except==0.1.1->api==0.0.0) (50.3.0)
Processing /root/.cache/pip/wheels/19/1f/a7/60ea9eb3459854b86c53fa56907611a333a4c33bc71c7efab5/blis-0.4.1-cp37-cp37m-linux_x86_64.whl
Collecting plac<1.0.0,>=0.9.6
Using cached plac-0.9.6-py2.py3-none-any.whl (20 kB)
Processing /root/.cache/pip/wheels/0a/2e/9c/7538feea0263ab707c7f68dc94668d7826a95a25d3ba48f88d/srsly-1.0.2-cp37-cp37m-linux_x86_64.whl
Processing /root/.cache/pip/wheels/b8/9c/e1/8b0abcf68d8c3cc33dbf598edc9cc604b4e7d4853706371398/murmurhash-1.0.2-cp37-cp37m-linux_x86_64.whl
Processing /root/.cache/pip/wheels/6b/49/58/315d7bcdd2de3f3575d95c9ef709f22bb3171cfa71c9351234/preshed-3.0.2-cp37-cp37m-linux_x86_64.whl
Processing /root/.cache/pip/wheels/f1/ea/43/a63cb13e58bb21579c46cbe64d32a4412bcd570b11a01f74d5/cymem-2.0.3-cp37-cp37m-linux_x86_64.whl
Processing /root/.cache/pip/wheels/4e/32/e3/11920446cbac93b90e0259320c3ebcf8be782977d7251ec45b/numpy-1.19.2-cp37-cp37m-linux_x86_64.whl
Processing /root/.cache/pip/wheels/a3/ea/d7/ba286d11640d554cd12929f510fa3cbd02d0bf8339b4eb2f57/thinc-7.1.1-cp37-cp37m-linux_x86_64.whl
Your Environment
- Operating System: docker run on Windows-10-10.0.19041-SP0
- Python Version Used: 3.7(inside docker)
- spaCy Version Used: 2.2.0
- Environment Information: Docker
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (4 by maintainers)
Top Results From Across the Web
Install spaCy · spaCy Usage Documentation
spaCy is a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency parsing, word vectors and more....
Read more >spaCy and Docker: can't "dockerize" Flask app that uses ...
The spacy installation extremely slow in docker Github issue explains the problem with the Alpine Python docker (I see you have FROM ...
Read more >Speeding up Cloud Build by an order of magnitude
Every time I push a commit to github, Cloud Build is triggered, builds all my Docker images and deploys them automatically. It's an...
Read more >Using Alpine can make Python Docker builds 50× slower
Alpine Linux is often recommended as a smaller, faster Docker base image. But if you're using Python, it will slow down your build...
Read more >Image Layer Details - rasa/rasa_nlu:latest-spacy - Docker Hub
/bin/sh -c pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_md-2.0.0/en_core_web_md-2.0.0.tar.gz. 171.96 MB.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
There are likely a few other issues that are making this the worst case scenario for you.
If you’re using an Alpine Linux container, you should probably not: it’s a bad choice for Python because none of the PyPi wheels will work, so you’ll have to rebuild everything. If you’re determined to use Alpine Linux you should host yourself a wheelhouse so that you don’t have to rebuild the wheels all the time. I don’t think it’s worthwhile but that’s up to you.
If you have to recompile everything, the longest compile times you’ll be facing are actually in numpy and cython-blis, which provide the numeric kernels.
Even if you’re only compiling spaCy itself, builds are quite a lot slower on WSL, due to its poor IO performance. Cython generates surprisingly large source files, as out of all the trade-offs concerned, that’s considered less important than other factors. The downside is that there’s a lot of disk activity, which slows the compilation down a lot.
We’d like to speed up compilation for our own development workflow, but we don’t currently consider it a usability issue. We’ve made sure wheels are available for the default workflows, and if you’re using a non-default workflow, it’s up to you to get configured so you can reuse wheels for your platform.
Hmm, my first guess is that an older version of pip isn’t detecting that some pre-built wheels are compatible with your environment. Try upgrading
pip
before installingspacy
withpip install --upgrade pip
.There also aren’t pre-built wheels for our packages for python3.8 for any releases prior to python3.8’s release date (we don’t go back to build additional wheels for earlier versions), so if it’s python3.8 instead of python3.7, that could also explain why it’s compiling spacy and all its dependencies from scratch.