NER Training does not work when using BILOU tagging
See original GitHub issueNER training is not working per document/tutorials.
Specifically, offsets do appear to work. entity labels do not appear to work. Also the documentation is in conflict with itself which confuses the situation.
Using the entity-label NER training example
e.g.
nlp = spacy.load('en')
doc = Doc(nlp.vocab, [u'rats', u'make', u'good', u'pets'])
gold = GoldParse(doc, [u'U-ANIMAL', u'O', u'O', u'O'])
ner = EntityRecognizer(nlp.vocab, entity_types=['ANIMAL'])
ner.update(doc, gold)
As far as I know my syntax is correct. This doesn’t work either:
doc = Doc(nlp.vocab, words=["The", "Law", "and", "Justice", "party", "is", "growing", "in", "Poland"])
gold = GoldParse(doc, ['O', 'B-ORG', 'I-ORG', 'I-ORG', 'L-ORG', 'O', 'O', 'O', 'GPE'])
ner.update(doc, gold)
I get the following error:
TypeError Traceback (most recent call last) <ipython-input-30-ef64e41c324a> in <module>() ----> 1 ner.update(doc, gold)
/usr/local/lib/python3.5/dist-packages/spacy/syntax/parser.cpython-35m-x86_64-linux-gnu.so in spacy.syntax.parser.Parser.update (spacy/syntax/parser.cpp:7788)()
/usr/local/lib/python3.5/dist-packages/spacy/syntax/ner.cpython-35m-x86_64-linux-gnu.so in spacy.syntax.ner.BiluoPushDown.preprocess_gold (spacy/syntax/ner.cpp:4782)()
/usr/local/lib/python3.5/dist-packages/spacy/syntax/ner.cpython-35m-x86_64-linux-gnu.so in spacy.syntax.ner.BiluoPushDown.lookup_transition (spacy/syntax/ner.cpp:5145)()
TypeError: argument of type ‘NoneType’ is not iterable
I think this is a bug in GoldParse since offsets appear to work.
e.g.
ner = EntityRecognizer(nlp.vocab, entity_types=['ORG'])
doc = nlp.make_doc('The Law and Justice party is growing')
gold = GoldParse(doc, entities=[(4,25,'ORG')])
ner.update(doc, gold)
ner(doc)
print(doc.ents[0].text,doc.ents[0].label_)
--> Law and Justice party ORG
Also, the documentation is very inconsistent/confusing right now.
Conflicting examples:
- https://spacy.io/docs/usage/training
- https://spacy.io/docs/usage/entity-recognition
- https://github.com/explosion/spaCy/blob/master/examples/training/train_ner.py
example 1 does not work example 2 works for token offsets, does not work for token-level entity annotation. example 3 is linked from 1 (as the ‘full example’), and they are totally different examples.
Your Environment
Ubuntu Python 3.5.2 1.2, latest PIP
Issue Analytics
- State:
- Created 7 years ago
- Comments:7 (2 by maintainers)
Top GitHub Comments
Fixed in v1.8.0 🎉
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.