spaCy-models: Please Consider Distributing via PyPi
See original GitHub issueFeature Summary
Release spaCy
models via PyPi
Feature Description
We use spaCy
in an enterprise setting. For security, the hosts that build production docker images cannot connect to the external internet. This introduces complexity when trying to install packages like spacy-models
, where the recommended installation method is to either install from a Github release (requiring a connection to github.com) or to vendor the package (avoids networking issues, but bloats individual repos).
Publishing the models through PyPi would be beneficial in that spacy-models
would no longer be installed differently than other packages & would also allow us to benefit from the security that PyPi provides (e.g. ability to mirror the package index on our internal network, assurance that package versions are immutable, etc.).
Perhaps you could start with adding the small models to PyPi, as they would not run into default package size restrictions. PyPi allows package authors to file a request increasing the maximum allowable size of the package: the increased limits would easily support the medium models. There is also precedent for setting size limits that would allow for distributing the large models as well.
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (5 by maintainers)
Top GitHub Comments
Having the packages download the models from github wouldn’t help with the security restrictions mentioned above.
The model packages are standard pip packages with longer names like
en_core_web_sm
. If you install the package from a downloaded.tar.gz
fromspacy-models
or withspacy download en_core_web_sm
you’ll just haveen_core_web_sm
and noen
shortcut.In contrast,
spacy download en
does several things: 1) map the shortcut nameen
to the packageen_core_web_sm
, 2) download and install theen_core_web_sm
package with pip, 3) add a symlink fromen
toen_core_web_sm
. The symlink is a separate step that doesn’t involve pip or how the model package is installed.We’ve realized that the symlinks cause a number of headaches, so we don’t recommend them anymore and are planning to remove them in spacy v3. Then you will only be able to use the full package names like
en_core_web_sm
withspacy.load()
.This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.