Combine 'ner' model with 'core_sci' model
See original GitHub issueHi,
I am working on a project using neuralcoref and I would like to incorporate the scispacy ner models. My hope was to use one of the ner
models in combination with the core_sci
tagger and dependency parser.
NeuralCoref depends on the tagger, parser, and ner
.
So far I have tried this code:
cust_ner = spacy.load('en_ner_craft_md')
nlp = spacy.load('en_core_sci_md')
nlp.remove_pipe('ner')
nlp.add_pipe(cust_ner, name="ner", last=True)
but when I pass text to the nlp
object , I get the following error:
TypeError: Argument 'string' has incorrect type (expected str, got spacy.tokens.doc.Doc)
When I look at the nlp.pipeline
attribute after adding the cust_ner
to the pipe I see the cust_ner
added as a Language
object rather than a EntityRecognizer
object:
[('tagger', <spacy.pipeline.pipes.Tagger object at 0x7fb84976eda0>), ('parser', <spacy.pipeline.pipes.DependencyParser object at 0x7fb849516288>), ('ner', <spacy.lang.en.English object at 0x7fb853725668>)]
Before I start hacking away and writing terrible code, I thought I would reach out to see if you had any suggestions in how to accomplish what I am after?
Thanks in advance and for all that you folks do!
Issue Analytics
- State:
- Created 4 years ago
- Comments:26
Top GitHub Comments
hmm, couple of issues here:
Until 0.2.4, none of the specialised NER models contained the full pipeline. I didn’t add it in because it fits with spacy’s naming convention
{lang}_{model}_{data}_{size}
. It’s not really a problem that 0.2.4 contains them (just a miscommunication between Daniel and I), and maybe it’s actually a good thing given this problem.https://support.prodi.gy/t/error-assigning-label-id-when-combining-custom-ner-model-from-prodigy-with-spacy-dependency-parsing-model/1444/2 This seems to be a similar problem. Basically what I think is happening is that spacy assumes that all NER labels are in the vocabulary - here they are not, because the vocabs are different. You might find that just adding the literal strings the NER model needs for its labels to the vocabulary of the one for the parser/tagger works.
So unfortunately the best I could do was to manually replace the
vocab
dir in thecore_sci_md
folder with thevocab
found in thener_craft_md
folder.I tried various ways to not have to manually copy the
vocab
folder, but none worked. It seems thetagger, parser
depend on thetokenizer
in thesci_md
and thener
relies on thevocab
. I tried to follow this logic and create aLanguage
object from class but it did not work for me.After moving the
vocab
dir, replace thener
pipe in thesci_md
model with thener
model as shown below: