Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Combine 'ner' model with 'core_sci' model

See original GitHub issue

Hi,

I am working on a project using neuralcoref and I would like to incorporate the scispacy ner models. My hope was to use one of the ner models in combination with the core_sci tagger and dependency parser.

NeuralCoref depends on the tagger, parser, and ner.

So far I have tried this code:

cust_ner = spacy.load('en_ner_craft_md')
nlp = spacy.load('en_core_sci_md')
nlp.remove_pipe('ner')
nlp.add_pipe(cust_ner, name="ner", last=True)

but when I pass text to the nlp object , I get the following error: TypeError: Argument 'string' has incorrect type (expected str, got spacy.tokens.doc.Doc)

When I look at the nlp.pipeline attribute after adding the cust_ner to the pipe I see the cust_ner added as a Language object rather than a EntityRecognizer object:

[('tagger', <spacy.pipeline.pipes.Tagger object at 0x7fb84976eda0>), ('parser', <spacy.pipeline.pipes.DependencyParser object at 0x7fb849516288>), ('ner', <spacy.lang.en.English object at 0x7fb853725668>)]

Before I start hacking away and writing terrible code, I thought I would reach out to see if you had any suggestions in how to accomplish what I am after?

Thanks in advance and for all that you folks do!

Issue Analytics

State:
Created 4 years ago
Comments:26

Top GitHub Comments

1reaction

DeNeutoycommented, Dec 7, 2019

hmm, couple of issues here:

Until 0.2.4, none of the specialised NER models contained the full pipeline. I didn’t add it in because it fits with spacy’s naming convention {lang}_{model}_{data}_{size}. It’s not really a problem that 0.2.4 contains them (just a miscommunication between Daniel and I), and maybe it’s actually a good thing given this problem.
https://support.prodi.gy/t/error-assigning-label-id-when-combining-custom-ner-model-from-prodigy-with-spacy-dependency-parsing-model/1444/2 This seems to be a similar problem. Basically what I think is happening is that spacy assumes that all NER labels are in the vocabulary - here they are not, because the vocabs are different. You might find that just adding the literal strings the NER model needs for its labels to the vocabulary of the one for the parser/tagger works.

1reaction

masonedmisoncommented, Dec 6, 2019

So unfortunately the best I could do was to manually replace the vocab dir in the core_sci_md folder with the vocab found in the ner_craft_md folder.

I tried various ways to not have to manually copy the vocab folder, but none worked. It seems the tagger, parser depend on the tokenizer in the sci_md and the ner relies on the vocab. I tried to follow this logic and create a Language object from class but it did not work for me.

After moving the vocab dir, replace the ner pipe in the sci_md model with the ner model as shown below:

ner_mod = spacy.load('en_ner_craft_md')
ner_pipe = ner_mod.pipeline[0][1]
nlp = spacy.load('en_core_sci_md')
nlp.remove_pipe('ner')
nlp.add_pipe(ner_pipe, name="ner", last=True)