question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

NER Training does not work when using BILOU tagging

See original GitHub issue

NER training is not working per document/tutorials.

Specifically, offsets do appear to work. entity labels do not appear to work. Also the documentation is in conflict with itself which confuses the situation.

Using the entity-label NER training example

e.g.

nlp = spacy.load('en')
doc = Doc(nlp.vocab, [u'rats', u'make', u'good', u'pets'])
gold = GoldParse(doc, [u'U-ANIMAL', u'O', u'O', u'O'])
ner = EntityRecognizer(nlp.vocab, entity_types=['ANIMAL'])
ner.update(doc, gold)

As far as I know my syntax is correct. This doesn’t work either:

doc = Doc(nlp.vocab, words=["The", "Law", "and", "Justice", "party", "is", "growing", "in", "Poland"])
gold = GoldParse(doc, ['O', 'B-ORG', 'I-ORG', 'I-ORG', 'L-ORG', 'O', 'O', 'O', 'GPE'])
ner.update(doc, gold)

I get the following error:

TypeError Traceback (most recent call last) <ipython-input-30-ef64e41c324a> in <module>() ----> 1 ner.update(doc, gold)

/usr/local/lib/python3.5/dist-packages/spacy/syntax/parser.cpython-35m-x86_64-linux-gnu.so in spacy.syntax.parser.Parser.update (spacy/syntax/parser.cpp:7788)()

/usr/local/lib/python3.5/dist-packages/spacy/syntax/ner.cpython-35m-x86_64-linux-gnu.so in spacy.syntax.ner.BiluoPushDown.preprocess_gold (spacy/syntax/ner.cpp:4782)()

/usr/local/lib/python3.5/dist-packages/spacy/syntax/ner.cpython-35m-x86_64-linux-gnu.so in spacy.syntax.ner.BiluoPushDown.lookup_transition (spacy/syntax/ner.cpp:5145)()

TypeError: argument of type ‘NoneType’ is not iterable

I think this is a bug in GoldParse since offsets appear to work.

e.g.

ner = EntityRecognizer(nlp.vocab, entity_types=['ORG'])
doc = nlp.make_doc('The Law and Justice party is growing')
gold = GoldParse(doc, entities=[(4,25,'ORG')])
ner.update(doc, gold)
ner(doc)
print(doc.ents[0].text,doc.ents[0].label_)
--> Law and Justice party ORG

Also, the documentation is very inconsistent/confusing right now.

Conflicting examples:

  1. https://spacy.io/docs/usage/training
  2. https://spacy.io/docs/usage/entity-recognition
  3. https://github.com/explosion/spaCy/blob/master/examples/training/train_ner.py

example 1 does not work example 2 works for token offsets, does not work for token-level entity annotation. example 3 is linked from 1 (as the ‘full example’), and they are totally different examples.

Your Environment

Ubuntu Python 3.5.2 1.2, latest PIP

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

3reactions
honnibalcommented, Apr 16, 2017

Fixed in v1.8.0 🎉


>>> nlp = spacy.load('en')
>>> from spacy.gold import GoldParse
>>> doc = nlp.make_doc(u'Facebook is a company')
>>> nlp.tagger(doc)
>>> gold = GoldParse(doc, entities=['U-ORG', 'O', 'O', 'O'])
>>> [t for t in gold.ner]
['U-ORG', 'O', 'O', 'O']
>>> nlp.entity.update(doc, gold)
1.0
>>> nlp.entity.update(doc, gold)
1.0
>>> nlp.entity.update(doc, gold)
0.0
>>> nlp.entity(doc)
>>> for ent in doc:
...   print(ent.text, ent.label_)
Facebook ORG
... 
0reactions
lock[bot]commented, May 8, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Read more comments on GitHub >

github_iconTop Results From Across the Web

NER Training does not work when using BILOU tagging
NER training is not working per document/tutorials. Specifically, offsets do appear to work. entity labels do not appear to work. Also the documentation...
Read more >
Spacy 2.0 NER Training - nlp
In SpacyV1 it was possible to train the NER model by providing a document and a list of entity annotations in BILOU format....
Read more >
Training Custom NER models in SpaCy to auto-detect ...
This is how you can update and train the Named Entity Recognizer of any existing model in spaCy. 7. How to train NER...
Read more >
Extend Named Entity Recogniser (NER) to label new ...
While training it is possible that the newly trained model can forget to recognise old entities, therefore, it is highly recommended to mix...
Read more >
Training a NER System Using a Large Dataset
I had the same problem. Python 3 return an iterator, so you have to wrap the zip call in to_dataset into a list(zip(words,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found