Put spaCy data in a shared path
See original GitHub issueWould it be possible to have spaCy data work similarly to NLTK_data where it goes to a shared path, i.e., C:\nltk_data
for Windows, /usr/local/share/nltk_data
for macOS, or /usr/share/nltk_data
for Unix (obviously substituting spacy_data
for nltk_data
)?
I understand that I can have it download to a custom location but it would be nice to have it look for it automatically rather than having to set spacy.util.set_data_path()
before calling spacy.load()
, or by passing a path
argument to spacy.en.English
.
My use case for this is deploying it in computer labs, were it’d be preferable for me to be able to package and deploy the data without each user having to download it individually. Especially in cases where each user has an ~/anaconda
folder since the data downloads to ~/anaconda/lib/python3.5/site-packages/spacy/en/data
for each user. It’d be (selfishly) easier for a user to be able to use spacy without me telling them where the data is and without them filling up the HD.
If there’s a reason that it’s done the way it currently is, that’s fine 😃
Issue Analytics
- State:
- Created 7 years ago
- Comments:11 (6 by maintainers)
Top GitHub Comments
Just to give you a heads-up – this will be fixed in v1.7!
You’ll be able to store your data wherever you want, and download and install models directly, or using the new
spacy.download
command. Models can be installed as Python packages via pip or loaded in manually. There’ll also be a new commandspacy.link
that lets you set up symlinks for your models (local directory or installed Python package), so you can load them by name, e.g.spacy.load('my_cool_model')
. This will also make it much easier to use your own models with spaCy.We’re just in the process of reuploading all the models (taking a bit longer than expected, because we’ve trained new models and decided to provide different options, i.e. with GloVe vectors and without). But as soon as they’re up, we’ll push the new release and docs 🎉
/ cc: @cwhits, @jck
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.