Lemma_ for "I" returns weird value: -PRON-
See original GitHub issueHey,
I noticed something weird when finding the lemma_ of tokens.
When I find the lemma_ for the token for ‘cakes’: nlp("cakes")[0].lemma_
, I get what is expected: ‘cake’.
The same thing applies for nlp("i")[0].lemma_
which gives ‘i’. However, I get some weird behavior when I use an uppercase “I”, as in “I am hungry”.
>>> nlp = spacy.load('en')
>>> print(nlp("I")[0].lemma_)
'-PRON-'
I’m not sure if this is intended behavior, or a bug. If it’s a bug, is this something that’s been encountered before?
I’m running spacy 1.7.3 on osx.
- spaCy version: 1.7.3
- Platform: Darwin-16.4.0-x86_64-i386-64bit
- Python version: 3.6.0
- Installed models: en
Issue Analytics
- State:
- Created 6 years ago
- Comments:11 (4 by maintainers)
Top Results From Across the Web
Spacy - lemmatization on pronouns gives some erronous output
spaCy's solution is to introduce a novel symbol, -PRON- , which is used as the lemma for all personal pronouns. It might be...
Read more >spaCy does tokenization, sentence recognition, part of speech ...
Let's look at the sentences sents = [] # the "sents" property returns spans # spans have indices into the original string #...
Read more >Stemming vs Lemmatization - Towards Data Science
Lemmatization looks at surrounding text to determine a given word's part of speech, it does not categorize phrases. Here we're using an f- ......
Read more >Text Analysis Word Counting Lemmatizing and TF-IDF
spaCy: A weird but good text analysis thingie. spaCy is a dream, but a dream where sometimes ... 'PRON' else token.orth_ for token...
Read more >Ren'Py 6.99.14 Released - Lemma Soft Forums
I checked "variable viewer" from SHIFT + D menu, and the value of fexpress is the ... One weird thing just happens: RenPy...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’ll repost my argument against '-PRON-’ lemmas here to make it visible to other interested participants: lemmas should arguably be part of the language. I’m not a lexicographer or linguists, but looking at the definitions, I’m almost certain that it is the case. For practical reasons also: lemmatisation may be directly used for looking up items in external lexical resources. Using an artificial lemma is a guarantee that nothing will be found.
The look-up argument is decisive: the
-PRON-
lemma will be reversed in spaCy 2.It sucks to change this, but it’s better to be correct going forward.
Thanks @adam-ra for your input on this