question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Put spaCy data in a shared path

See original GitHub issue

Would it be possible to have spaCy data work similarly to NLTK_data where it goes to a shared path, i.e., C:\nltk_data for Windows, /usr/local/share/nltk_data for macOS, or /usr/share/nltk_data for Unix (obviously substituting spacy_data for nltk_data)?

I understand that I can have it download to a custom location but it would be nice to have it look for it automatically rather than having to set spacy.util.set_data_path() before calling spacy.load(), or by passing a path argument to spacy.en.English.

My use case for this is deploying it in computer labs, were it’d be preferable for me to be able to package and deploy the data without each user having to download it individually. Especially in cases where each user has an ~/anaconda folder since the data downloads to ~/anaconda/lib/python3.5/site-packages/spacy/en/data for each user. It’d be (selfishly) easier for a user to be able to use spacy without me telling them where the data is and without them filling up the HD.

If there’s a reason that it’s done the way it currently is, that’s fine 😃

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:11 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
inescommented, Mar 17, 2017

Just to give you a heads-up – this will be fixed in v1.7!

You’ll be able to store your data wherever you want, and download and install models directly, or using the new spacy.download command. Models can be installed as Python packages via pip or loaded in manually. There’ll also be a new command spacy.link that lets you set up symlinks for your models (local directory or installed Python package), so you can load them by name, e.g. spacy.load('my_cool_model'). This will also make it much easier to use your own models with spaCy.

We’re just in the process of reuploading all the models (taking a bit longer than expected, because we’ve trained new models and decided to provide different options, i.e. with GloVe vectors and without). But as soon as they’re up, we’ll push the new release and docs 🎉

/ cc: @cwhits, @jck

0reactions
lock[bot]commented, May 9, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Saving and Loading · spaCy Usage Documentation
To load a pipeline from a data directory, you can use spacy.load() with the local path. This will look for a config.cfg in...
Read more >
Language Processing Pipelines · spaCy Usage Documentation
spaCy is a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency parsing, word vectors and more....
Read more >
Training Pipelines & Models · spaCy Usage Documentation
Train and update components on your own data and integrate custom models.
Read more >
spaCy 101: Everything you need to know
The shared language data in the directory root includes rules that can be generalized across languages – for example, rules for basic punctuation,...
Read more >
Projects · spaCy Usage Documentation
Since spaCy v3.4.2, spacy projects run checks your installed dependencies to verify that your environment is properly set up and aligns with the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found