question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

spacy installation extremely slow in docker

See original GitHub issue

How to reproduce the problem

Details: When installing spacy library inside docker, the installation process is extremely slow because spacy taking a lot of time to build the wheel. It takes more than 50+ minutes to run and build this spacy library. Why is a library with just 5.8MB taking this much time?

Collecting spacy==2.2.0
  Downloading spacy-2.2.0.tar.gz (5.8 MB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Installing backend dependencies: started
  Installing backend dependencies: still running...
  Installing backend dependencies: still running...
  Installing backend dependencies: still running...
  Installing backend dependencies: still running...
  Installing backend dependencies: still running...
  Installing backend dependencies: still running...
  Installing backend dependencies: still running...
  Installing backend dependencies: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Collecting SQLAlchemy==1.3.12
  Downloading SQLAlchemy-1.3.12.tar.gz (6.0 MB)
Collecting testfixtures==6.10.3
  Downloading testfixtures-6.10.3-py2.py3-none-any.whl (86 kB)
Collecting toml==0.10.0
  Downloading toml-0.10.0-py2.py3-none-any.whl (25 kB)
Collecting twython==3.8.2
  Downloading twython-3.8.2-py3-none-any.whl (33 kB)
Collecting typed-ast==1.4.0
  Downloading typed_ast-1.4.0.tar.gz (206 kB)
Collecting urllib3==1.25.7
  Downloading urllib3-1.25.7-py2.py3-none-any.whl (125 kB)
Collecting wcwidth==0.1.8
  Downloading wcwidth-0.1.8-py2.py3-none-any.whl (17 kB)
Collecting Werkzeug==0.16.0
  Downloading Werkzeug-0.16.0-py2.py3-none-any.whl (327 kB)
Collecting wrapt==1.11.2
  Downloading wrapt-1.11.2.tar.gz (27 kB)
Collecting zipp==0.6.0
  Downloading zipp-0.6.0-py2.py3-none-any.whl (4.1 kB)
Requirement already satisfied: setuptools in /usr/local/lib/python3.7/site-packages (from flake8-blind-except==0.1.1->api==0.0.0) (50.3.0)
Processing /root/.cache/pip/wheels/19/1f/a7/60ea9eb3459854b86c53fa56907611a333a4c33bc71c7efab5/blis-0.4.1-cp37-cp37m-linux_x86_64.whl
Collecting plac<1.0.0,>=0.9.6
  Using cached plac-0.9.6-py2.py3-none-any.whl (20 kB)
Processing /root/.cache/pip/wheels/0a/2e/9c/7538feea0263ab707c7f68dc94668d7826a95a25d3ba48f88d/srsly-1.0.2-cp37-cp37m-linux_x86_64.whl
Processing /root/.cache/pip/wheels/b8/9c/e1/8b0abcf68d8c3cc33dbf598edc9cc604b4e7d4853706371398/murmurhash-1.0.2-cp37-cp37m-linux_x86_64.whl
Processing /root/.cache/pip/wheels/6b/49/58/315d7bcdd2de3f3575d95c9ef709f22bb3171cfa71c9351234/preshed-3.0.2-cp37-cp37m-linux_x86_64.whl
Processing /root/.cache/pip/wheels/f1/ea/43/a63cb13e58bb21579c46cbe64d32a4412bcd570b11a01f74d5/cymem-2.0.3-cp37-cp37m-linux_x86_64.whl
Processing /root/.cache/pip/wheels/4e/32/e3/11920446cbac93b90e0259320c3ebcf8be782977d7251ec45b/numpy-1.19.2-cp37-cp37m-linux_x86_64.whl
Processing /root/.cache/pip/wheels/a3/ea/d7/ba286d11640d554cd12929f510fa3cbd02d0bf8339b4eb2f57/thinc-7.1.1-cp37-cp37m-linux_x86_64.whl

Your Environment

  • Operating System: docker run on Windows-10-10.0.19041-SP0
  • Python Version Used: 3.7(inside docker)
  • spaCy Version Used: 2.2.0
  • Environment Information: Docker

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
honnibalcommented, Sep 28, 2020

There are likely a few other issues that are making this the worst case scenario for you.

If you’re using an Alpine Linux container, you should probably not: it’s a bad choice for Python because none of the PyPi wheels will work, so you’ll have to rebuild everything. If you’re determined to use Alpine Linux you should host yourself a wheelhouse so that you don’t have to rebuild the wheels all the time. I don’t think it’s worthwhile but that’s up to you.

If you have to recompile everything, the longest compile times you’ll be facing are actually in numpy and cython-blis, which provide the numeric kernels.

Even if you’re only compiling spaCy itself, builds are quite a lot slower on WSL, due to its poor IO performance. Cython generates surprisingly large source files, as out of all the trade-offs concerned, that’s considered less important than other factors. The downside is that there’s a lot of disk activity, which slows the compilation down a lot.

We’d like to speed up compilation for our own development workflow, but we don’t currently consider it a usability issue. We’ve made sure wheels are available for the default workflows, and if you’re using a non-default workflow, it’s up to you to get configured so you can reuse wheels for your platform.

2reactions
adrianeboydcommented, Sep 28, 2020

Hmm, my first guess is that an older version of pip isn’t detecting that some pre-built wheels are compatible with your environment. Try upgrading pip before installing spacy with pip install --upgrade pip.

There also aren’t pre-built wheels for our packages for python3.8 for any releases prior to python3.8’s release date (we don’t go back to build additional wheels for earlier versions), so if it’s python3.8 instead of python3.7, that could also explain why it’s compiling spacy and all its dependencies from scratch.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Install spaCy · spaCy Usage Documentation
spaCy is a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency parsing, word vectors and more....
Read more >
spaCy and Docker: can't "dockerize" Flask app that uses ...
The spacy installation extremely slow in docker Github issue explains the problem with the Alpine Python docker (I see you have FROM ...
Read more >
Speeding up Cloud Build by an order of magnitude
Every time I push a commit to github, Cloud Build is triggered, builds all my Docker images and deploys them automatically. It's an...
Read more >
Using Alpine can make Python Docker builds 50× slower
Alpine Linux is often recommended as a smaller, faster Docker base image. But if you're using Python, it will slow down your build...
Read more >
Image Layer Details - rasa/rasa_nlu:latest-spacy - Docker Hub
/bin/sh -c pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_md-2.0.0/en_core_web_md-2.0.0.tar.gz. 171.96 MB.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found