Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

NER Training does not work when using BILOU tagging

See original GitHub issue

NER training is not working per document/tutorials.

Specifically, offsets do appear to work. entity labels do not appear to work. Also the documentation is in conflict with itself which confuses the situation.

Using the entity-label NER training example

e.g.

nlp = spacy.load('en')
doc = Doc(nlp.vocab, [u'rats', u'make', u'good', u'pets'])
gold = GoldParse(doc, [u'U-ANIMAL', u'O', u'O', u'O'])
ner = EntityRecognizer(nlp.vocab, entity_types=['ANIMAL'])
ner.update(doc, gold)

As far as I know my syntax is correct. This doesn’t work either:

doc = Doc(nlp.vocab, words=["The", "Law", "and", "Justice", "party", "is", "growing", "in", "Poland"])
gold = GoldParse(doc, ['O', 'B-ORG', 'I-ORG', 'I-ORG', 'L-ORG', 'O', 'O', 'O', 'GPE'])
ner.update(doc, gold)

I get the following error:

TypeError Traceback (most recent call last) <ipython-input-30-ef64e41c324a> in <module>() ----> 1 ner.update(doc, gold)

/usr/local/lib/python3.5/dist-packages/spacy/syntax/parser.cpython-35m-x86_64-linux-gnu.so in spacy.syntax.parser.Parser.update (spacy/syntax/parser.cpp:7788)()

/usr/local/lib/python3.5/dist-packages/spacy/syntax/ner.cpython-35m-x86_64-linux-gnu.so in spacy.syntax.ner.BiluoPushDown.preprocess_gold (spacy/syntax/ner.cpp:4782)()

/usr/local/lib/python3.5/dist-packages/spacy/syntax/ner.cpython-35m-x86_64-linux-gnu.so in spacy.syntax.ner.BiluoPushDown.lookup_transition (spacy/syntax/ner.cpp:5145)()

TypeError: argument of type ‘NoneType’ is not iterable

I think this is a bug in GoldParse since offsets appear to work.

e.g.

ner = EntityRecognizer(nlp.vocab, entity_types=['ORG'])
doc = nlp.make_doc('The Law and Justice party is growing')
gold = GoldParse(doc, entities=[(4,25,'ORG')])
ner.update(doc, gold)
ner(doc)
print(doc.ents[0].text,doc.ents[0].label_)
--> Law and Justice party ORG

Also, the documentation is very inconsistent/confusing right now.

Conflicting examples:

example 1 does not work example 2 works for token offsets, does not work for token-level entity annotation. example 3 is linked from 1 (as the ‘full example’), and they are totally different examples.

Your Environment

Ubuntu Python 3.5.2 1.2, latest PIP

Issue Analytics

State:
Created 7 years ago
Comments:7 (2 by maintainers)

Top GitHub Comments

3reactions

honnibalcommented, Apr 16, 2017

Fixed in v1.8.0 🎉


>>> nlp = spacy.load('en')
>>> from spacy.gold import GoldParse
>>> doc = nlp.make_doc(u'Facebook is a company')
>>> nlp.tagger(doc)
>>> gold = GoldParse(doc, entities=['U-ORG', 'O', 'O', 'O'])
>>> [t for t in gold.ner]
['U-ORG', 'O', 'O', 'O']
>>> nlp.entity.update(doc, gold)
1.0
>>> nlp.entity.update(doc, gold)
1.0
>>> nlp.entity.update(doc, gold)
0.0
>>> nlp.entity(doc)
>>> for ent in doc:
...   print(ent.text, ent.label_)
Facebook ORG
...

0reactions

lock[bot]commented, May 8, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Top Results From Across the Web

NER Training does not work when using BILOU tagging

NER training is not working per document/tutorials. Specifically, offsets do appear to work. entity labels do not appear to work. Also the documentation...

Spacy 2.0 NER Training - nlp

In SpacyV1 it was possible to train the NER model by providing a document and a list of entity annotations in BILOU format....

Training Custom NER models in SpaCy to auto-detect ...

This is how you can update and train the Named Entity Recognizer of any existing model in spaCy. 7. How to train NER...

Extend Named Entity Recogniser (NER) to label new ...

While training it is possible that the newly trained model can forget to recognise old entities, therefore, it is highly recommended to mix...

Training a NER System Using a Large Dataset

I had the same problem. Python 3 return an iterator, so you have to wrap the zip call in to_dataset into a list(zip(words,...