English POS Tagger incorrectly tags word into PUNCT
See original GitHub issueHow to reproduce the behaviour
import spacy
nlp=spacy.load("en_core_web_md")
assert(nlp("back scatter")[1].pos_!="PUNCT")
What’s wrong: The POS for scatter should definitely not be PUNCT
Your Environment
- spaCy version: 2.2.4
- Platform: Linux-5.4.0-37-generic-x86_64-with-glibc2.29
- Python version: 3.8.2
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Getting incorrect POS tagging - Stack Overflow
1 Answer 1 · Spacy models are statistically trained models, that individually have a specific POS accuracy, in this case around 97%. ·...
Read more >5. Categorizing and Tagging Words - NLTK
The process of classifying words into their parts of speech and labeling them accordingly is known as part-of-speech tagging, POS-tagging, or simply tagging...
Read more >Part of Speech Tagging - Stanford University
How difficult is POS tagging in English? Roughly 15% of word types are ambiguous. •. Hence 85% of word types are unambiguous.
Read more >BNC2 POS-Tagging Guide - UCREL
For examples in this guide, we will retain just the POS-tag of the word (or words) ... or a foreign expression naturalised into...
Read more >Universal POS tags - Adjectives
Some words that could be seen as adjectives (and are tagged as such in other annotation schemes) have a different tag in UD:...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Because of how POS taggers generally work, I personally think it’s always going to be a challenge to get such results with high accuracy. But I guess you could try to retrain the tagger on phrases only.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.