OSError: [E050] in nlp.initialize()
See original GitHub issueI am trying to train an entity linker but I am getting the error
OSError: [E050] Can't find model 'corpus/en_vectors'. It doesn't seem to be a Python package or a valid path to a data directory.
in initialize()
.
I’ve looked at the docs here, but am struggling to see why the error is occurring.
How to reproduce the behaviour
from spacy.kb import KnowledgeBase
from spacy.training import Example
import spacy
nlp = spacy.load('en_core_web_lg')
# here I usually load my local vocab path, but the same error occurs without this
# nlp.vocab.from_disk(self.vocab_path)
nlp.vocab.vectors.name = "spacy_pretrained_vectors"
def create_kb(vocab):
entity_vector_length = 300
kb = KnowledgeBase(vocab=vocab, entity_vector_length=entity_vector_length)
# here I usually load my local knowledge base, but the same error occurs if you dont add anything
# kb.from_disk(self.kb_path)
return kb
entity_linker = nlp.add_pipe("entity_linker")
entity_linker.set_kb(create_kb)
train_data = []
text_1 = "Russ Cochran his reprints include EC Comics."
dict_1 = {(0, 12): {"Q7381115": 1.0, "Q2146908": 0.0}}
train_data.append((text_1, {"links": dict_1}))
text_2 = "Russ Cochran has been publishing comic art."
dict_2 = {(0, 12): {"Q7381115": 1.0, "Q2146908": 0.0}}
train_data.append((text_2, {"links": dict_2}))
text_3 = "Russ Cochran captured his first major title with his son as caddie."
dict_3 = {(0, 12): {"Q7381115": 0.0, "Q2146908": 1.0}}
train_data.append((text_3, {"links": dict_3}))
text_4 = "Russ Cochran was a member of University of Kentucky's golf team."
dict_4 = {(0, 12): {"Q7381115": 0.0, "Q2146908": 1.0}}
train_data.append((text_4, {"links": dict_4}))
examples = []
for text, annotation in train_data:
doc = nlp.make_doc(text)
example = Example.from_dict(doc, annotation)
examples.append(example)
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != "entity_linker"]
with nlp.select_pipes(disable=other_pipes):
optimizer = nlp.initialize()
for itn in range(n_iter):
random.shuffle(examples)
losses = {}
batches = minibatch(examples, size=compounding(4.0, 32.0, 1.001))
for batch in batches:
nlp.update(
batch, drop=0.2, losses=losses, sgd=optimizer,
)
I’m getting this same issue when I try to load my own knowledge base and vocab. For this I thought maybe I needed to change the config file to point to the correct vectors location (which is “local_kb_path/vocab/vectors”), so I tried:
config = {"initialize": {"vectors": 'local_kb_path/vocab/vectors'}}
entity_linker = nlp.add_pipe("entity_linker", config=config)
but this gives ‘extra fields not permitted’.
Many thanks!
Your Environment
- spaCy version: 3.0.0rc2
- Platform: Darwin-18.6.0-x86_64-i386-64bit
- Python version: 3.7.9
- Pipelines: en_core_web_md (3.0.0a0), en_core_web_lg (3.0.0a0)
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
OSError: [E050] Can't find model 'en' - Stack Overflow
when using spacy we have to download the model using python -m spacy download en_core_web_sm. If you have already done that make sure...
Read more >OSError: [E050] Can't find model 'en'. It doesn't seem to be a ...
I wanted to use the chatterbot spacy collaborate system and trained data on chatterbot and created a response chat system.
Read more >OSError: [E050] Can't find model 'en'. It doesn't seem to be a ...
I followed the github one. so as ines said on github import en_core_web_sm nlp = en_core_web_sm.load() Those commands were running successfully ...
Read more >Issue with exported spacy models - solved - Prodigy Support
OSError : [E050] Can't find model 'en_model.vectors'. It doesn't seem to be a shortcut link, a Python package or a valid path to...
Read more >Models & Languages · spaCy Usage Documentation
Use spacy.load() ... python -m spacy download en_core_web_smimport spacynlp ... Initializing the language object directly yields the same result as ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Sounds good! I’ll close this in the meantime, but feel free to reopen or open a new issue if you can’t get it to work. (for usage questions, you can also use our new discussion board btw! https://github.com/explosion/spaCy/discussions)
Thank you @svlandeg I will let you know how I get on with that advice 👍