question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can't replace NER pipe in an existing model. Works in 2.2.4 but crashes in 2.3

See original GitHub issue

How to reproduce the behaviour

Hello,

I am trying to replace the NER pipe from spaCy’s en_core_web_md with my own since I don’t need any of the NER but I would like to keep the trained tagger and parser. The docs aren’t exactly clear about this but what I do is below (using the ANIMAL training example)

nlp = spacy.load('en_core_web_md')
nlp.remove_pipe("ner")
ner = nlp.create_pipe("ner")
nlp.add_pipe(ner, last=True)
ner.add_label(LABEL)
train(TRAIN_DATA) # in here I call optimizer = nlp.begin_training()  with the new entity
nlp.to_disk('en_core_new_ner')

This all goes fine and the model successfully trains itself (the losses look normal), and the model saves and loads to disk. However when I try to create a doc object and print the ents with the new model I get this error:

 File "/usr/local/lib/python3.7/site-packages/plac_core.py", line 232, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "brain.py", line 192, in main
    doc = brain.nlp(example)
  File "/usr/local/lib/python3.7/site-packages/spacy/language.py", line 446, in __call__
    doc = proc(doc, **component_cfg.get(name, {}))
  File "pipes.pyx", line 398, in spacy.pipeline.pipes.Tagger.__call__
  File "pipes.pyx", line 443, in spacy.pipeline.pipes.Tagger.set_annotations
  File "morphology.pyx", line 292, in spacy.morphology.Morphology.assign_tag_id
ValueError: [E014] Unknown tag ID: 25

My assumption is that calling nlp.begin_training() is also screwing up the tagger/parser, but the folder clearly has the tagger and parser in it and I disable pipes before training. However the tag_map in this new model looks different from the tag map in en_core_web_md. My training code is almost identical to the one in the training example https://spacy.io/usage/training#example-new-entity-type

Is there any advice on how to cleanly keep the tagger and parser from an existing model but replace the NER component (I don’t want any of the built in entities) but also don’t want to have to retrain a tagger and parser.

Your Environment

spaCy version 2.3.0
Location /usr/local/lib/python3.7/site-packages/spacy Platform Darwin-18.6.0-x86_64-i386-64bit Python version 3.7.7

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
adrianeboydcommented, Jul 3, 2020

Thanks for the report! This is due to a bug related to tag maps in 2.3.0 (fixed in #5641). As a workaround, copying the tagger directory in the model should work, even if it’s not the prettiest solution. I hope 2.3.1 will be ready today, if not then very soon.

0reactions
github-actions[bot]commented, Nov 3, 2021

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Bug listing with status RESOLVED with resolution TEST ...
Bug:233 - "Emacs segfaults when merged through the sandbox." status:RESOLVED resolution:TEST-REQUEST severity:critical · Bug:3888 - "yenta_socket module not ...
Read more >
[Solved] .NET applications crashing - Visual Studio Feedback
In most of the cases, I can't create a project for example. It crashes randomly, whereas the .NET Core 2.2 SDK runs just...
Read more >
Release Notes — Airflow Documentation
If you cannot change the scheme of your URL immediately, Airflow continues to work with SQLAlchemy 1.3 and you can downgrade SQLAlchemy, but...
Read more >
Fix MW2 Crash on Startup/Not Launching | Complete Guide
Trying to play MW2, but can't because it crashes as soon as you click Play, or a few seconds later? This video tackles...
Read more >
Analysis of Four Staged Crashes of Passenger Vehicles into a ...
The trailer rotated and the tractor remained stationary. The impulse of the collision rocked the tractor trailer but did not tip it over....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found