UserWarning: [W030] Some entities could not be aligned in the text ...
See original GitHub issueSince upgrading to the latest Spacy 2.3.0 (I think from 2.2.4, but am not sure, I repeatedly get the following warning, always related to the same character ('-')
:
lib/python3.7/site-packages/spacy/language.py:479: UserWarning: [W030] Some entities could not be aligned in the text ... Use `spacy.gold.biluo_tags_from_offsets(nlp.make_doc(text), entities)` to check the alignment. Misaligned entities ('-') will be ignored during training.
gold = GoldParse(doc, **gold)
What does this warning mean? When does it occur?
Your Environment
Info about spaCy
- spaCy version: 2.3.0
- Platform: Linux-4.15.0-101-generic-x86_64-with-Ubuntu-18.04-bionic
- Python version: 3.7.4
Issue Analytics
- State:
- Created 3 years ago
- Comments:10 (6 by maintainers)
Top Results From Across the Web
Warning: [W030] Some entities could not be aligned in the text
The entity offsets need to align to token boundaries. You can't start/end an entity in the middle of a token. In your case,...
Read more >[W030] Some entities could not be aligned in the text - usage
Hello! We started from ner.manual annotating 10 custom entities and faced with the following issue trying to train a NER model: UserWarning: ...
Read more >resume ner model - Kaggle
/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:11: UserWarning: [W030] Some entities could not be aligned in the text "Afreen Jamadar Active ...
Read more >Update built-in NER model of Spacy instead of overwrite
During the training process, it is giving me the following error,. UserWarning: [W030] Some entities could not be aligned in the text "('I...
Read more >anly 520 assignment entity recognition.docx - Course Hero
Entity Recognition Assignment Harshit Verma 2021-01-11 Libraries / R Setup In. ... UserWarning: [W030] Some entities could not bealigned in the text "I ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @amaarora, this warning occurs when your “gold” entity offsets do not align with token boundaries as set by
nlp.make_doc
.In your last example, you see for instance that
STREET
(“Alawa Crescent”) could be aligned as the second (B-STREET
) and third (L-STREET
) token, but the first token (“Unit4,1”) was kept as 1 token by the tokenizer and got 3 different entity types assigned to it (PROPERTY_TYPE
,UNIT_NUMBER
andSTREET_RANGE
) which resulted in a-
instead because one token can only refer to one entity.You have three options:
Unit 4, 1 Alawa Crescent ...
It’s completely normal for a single entity to refer to multiple tokens, that should not cause problems. This warning indicates something weird with your annotations or tokenization. In your case you have very strange punctuation so it’s probably related to that, but I would need to see the whole sentence and annotations to say more.
As a note for your and anyone who reads this issue in the future, if you need help with a specific case, please provide this information:
To repeat Adriane’s example of the kind of problem that causes this warning:
“Sus” in “Susan” cannot be meaningfully assigned to an entity, because you can’t have an entity on half a token. If you are getting this warning you need to look at your annotations and tokenization to figure out why it is happening, because your misaligned annotations are unusable.