question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Combine 'ner' model with 'core_sci' model

See original GitHub issue

Hi,

I am working on a project using neuralcoref and I would like to incorporate the scispacy ner models. My hope was to use one of the ner models in combination with the core_sci tagger and dependency parser.

NeuralCoref depends on the tagger, parser, and ner.

So far I have tried this code:

cust_ner = spacy.load('en_ner_craft_md')
nlp = spacy.load('en_core_sci_md')
nlp.remove_pipe('ner')
nlp.add_pipe(cust_ner, name="ner", last=True)

but when I pass text to the nlp object , I get the following error: TypeError: Argument 'string' has incorrect type (expected str, got spacy.tokens.doc.Doc)

When I look at the nlp.pipeline attribute after adding the cust_ner to the pipe I see the cust_ner added as a Language object rather than a EntityRecognizer object:

[('tagger', <spacy.pipeline.pipes.Tagger object at 0x7fb84976eda0>), ('parser', <spacy.pipeline.pipes.DependencyParser object at 0x7fb849516288>), ('ner', <spacy.lang.en.English object at 0x7fb853725668>)]

Before I start hacking away and writing terrible code, I thought I would reach out to see if you had any suggestions in how to accomplish what I am after?

Thanks in advance and for all that you folks do!

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:26

github_iconTop GitHub Comments

1reaction
DeNeutoycommented, Dec 7, 2019

hmm, couple of issues here:

  1. Until 0.2.4, none of the specialised NER models contained the full pipeline. I didn’t add it in because it fits with spacy’s naming convention {lang}_{model}_{data}_{size}. It’s not really a problem that 0.2.4 contains them (just a miscommunication between Daniel and I), and maybe it’s actually a good thing given this problem.

  2. https://support.prodi.gy/t/error-assigning-label-id-when-combining-custom-ner-model-from-prodigy-with-spacy-dependency-parsing-model/1444/2 This seems to be a similar problem. Basically what I think is happening is that spacy assumes that all NER labels are in the vocabulary - here they are not, because the vocabs are different. You might find that just adding the literal strings the NER model needs for its labels to the vocabulary of the one for the parser/tagger works.

1reaction
masonedmisoncommented, Dec 6, 2019

So unfortunately the best I could do was to manually replace the vocab dir in the core_sci_md folder with the vocab found in the ner_craft_md folder.

I tried various ways to not have to manually copy the vocab folder, but none worked. It seems the tagger, parser depend on the tokenizer in the sci_md and the ner relies on the vocab. I tried to follow this logic and create a Language object from class but it did not work for me.

After moving the vocab dir, replace the ner pipe in the sci_md model with the ner model as shown below:

ner_mod = spacy.load('en_ner_craft_md')
ner_pipe = ner_mod.pipeline[0][1]
nlp = spacy.load('en_core_sci_md')
nlp.remove_pipe('ner')
nlp.add_pipe(ner_pipe, name="ner", last=True)
Read more comments on GitHub >

github_iconTop Results From Across the Web

How to create NER pipeline with multiple models in Spacy
The main issue is how to load and combine pipeline components such that they are using the same Vocab ( nlp.vocab ), since...
Read more >
(PDF) ScispaCy: Fast and Robust Models for Biomedical ...
Unlabeled attachment score (UAS) performance for an en core sci md model trained with ... The NER model in spaCy is a transition-based....
Read more >
arXiv:2008.07347v2 [cs.CL] 18 Aug 2020
In this work, we train, evaluate and make available NER models for ... (2019) by merging character-level pretraining and joint training on.
Read more >
Merging NER models - usage - Prodigy Support
Is there any way I can merge spacy entity linking model and a customized ner model I trained using ner batch train together?...
Read more >
ScispaCy: Fast and Robust Models for Biomedical Natural ...
models released in scispaCy and demonstrate ... word vectors, while those in en core sci sm have ... The NER model in spaCy...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found