Can't replace NER pipe in an existing model. Works in 2.2.4 but crashes in 2.3
See original GitHub issueHow to reproduce the behaviour
Hello,
I am trying to replace the NER pipe from spaCy’s en_core_web_md
with my own since I don’t need any of the NER but I would like to keep the trained tagger and parser. The docs aren’t exactly clear about this but what I do is below (using the ANIMAL training example)
nlp = spacy.load('en_core_web_md')
nlp.remove_pipe("ner")
ner = nlp.create_pipe("ner")
nlp.add_pipe(ner, last=True)
ner.add_label(LABEL)
train(TRAIN_DATA) # in here I call optimizer = nlp.begin_training() with the new entity
nlp.to_disk('en_core_new_ner')
This all goes fine and the model successfully trains itself (the losses look normal), and the model saves and loads to disk. However when I try to create a doc object and print the ents with the new model I get this error:
File "/usr/local/lib/python3.7/site-packages/plac_core.py", line 232, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "brain.py", line 192, in main
doc = brain.nlp(example)
File "/usr/local/lib/python3.7/site-packages/spacy/language.py", line 446, in __call__
doc = proc(doc, **component_cfg.get(name, {}))
File "pipes.pyx", line 398, in spacy.pipeline.pipes.Tagger.__call__
File "pipes.pyx", line 443, in spacy.pipeline.pipes.Tagger.set_annotations
File "morphology.pyx", line 292, in spacy.morphology.Morphology.assign_tag_id
ValueError: [E014] Unknown tag ID: 25
My assumption is that calling nlp.begin_training()
is also screwing up the tagger/parser, but the folder clearly has the tagger and parser in it and I disable pipes before training. However the tag_map
in this new model looks different from the tag map in en_core_web_md
. My training code is almost identical to the one in the training example https://spacy.io/usage/training#example-new-entity-type
Is there any advice on how to cleanly keep the tagger and parser from an existing model but replace the NER component (I don’t want any of the built in entities) but also don’t want to have to retrain a tagger and parser.
Your Environment
spaCy version 2.3.0
Location /usr/local/lib/python3.7/site-packages/spacy
Platform Darwin-18.6.0-x86_64-i386-64bit
Python version 3.7.7
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
Thanks for the report! This is due to a bug related to tag maps in 2.3.0 (fixed in #5641). As a workaround, copying the
tagger
directory in the model should work, even if it’s not the prettiest solution. I hope 2.3.1 will be ready today, if not then very soon.This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.