Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Lemma_ for "I" returns weird value: -PRON-

See original GitHub issue

Hey,

I noticed something weird when finding the lemma_ of tokens. When I find the lemma_ for the token for ‘cakes’: nlp("cakes")[0].lemma_, I get what is expected: ‘cake’. The same thing applies for nlp("i")[0].lemma_ which gives ‘i’. However, I get some weird behavior when I use an uppercase “I”, as in “I am hungry”.

>>> nlp = spacy.load('en')
>>> print(nlp("I")[0].lemma_)
'-PRON-'

I’m not sure if this is intended behavior, or a bug. If it’s a bug, is this something that’s been encountered before?

I’m running spacy 1.7.3 on osx.

spaCy version: 1.7.3
Platform: Darwin-16.4.0-x86_64-i386-64bit
Python version: 3.6.0
Installed models: en

Issue Analytics

State:
Created 6 years ago
Comments:11 (4 by maintainers)

Top GitHub Comments

5reactions

adam-racommented, Apr 12, 2017

I’ll repost my argument against '-PRON-’ lemmas here to make it visible to other interested participants: lemmas should arguably be part of the language. I’m not a lexicographer or linguists, but looking at the definitions, I’m almost certain that it is the case. For practical reasons also: lemmatisation may be directly used for looking up items in external lexical resources. Using an artificial lemma is a guarantee that nothing will be found.

4reactions

honnibalcommented, Apr 13, 2017

The look-up argument is decisive: the -PRON- lemma will be reversed in spaCy 2.

It sucks to change this, but it’s better to be correct going forward.

Thanks @adam-ra for your input on this