Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

leammtizer issue for german words

See original GitHub issue

I am confused about the lemmatizer. For a sentence Ich sehe Bäume (I see trees).

nlp = spacy.load('de_core_news_sm')
doc = nlp(u'Ich sehe Bäume')

for token in doc:
    print(token.text,token.lemma, token.lemma_, token.pos_)
    print("has_vector:", token.has_vector)

token.lemma is just Bäume. I thought it would be lemmatized to the singular form Baum (tree)?

Issue Analytics

State:
Created 5 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

inescommented, May 25, 2018

Yes, Baum would definitely be correct here. The German lemmatizer only uses lookup tables (and no rule-based process like the English one). This has some limitations – I’ve written a bit more about this in my comment on this thread.

Another problem is that spaCy will always decide on one lemma (and won’t just give you a bunch of options to choose from). This is convenient – but it also means that if the one pick has to be correct. That said, there’s definitely been some suspicious reports around the lemmatization performance that might indicate a bug.

In the meantime, you might want to check out the spacy-iwnlp extensions by @Liebeck and see how it performs on your use case!

0reactions

lock[bot]commented, Aug 5, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Top Results From Across the Web

Problems and errors in German lemmatizer · Issue #2486

Hi all,. I hope the code snippet exemplifies the problem clearly enough. Basically, I fail to see how the German lemmatization should be...

How to Lemmatize German Words with NLP-Spacy ...

Lemmatizer tools can analyze the types of word changes in the German language. Thus, this paper aims at investigating how the lemmatization of...

Ho to do lemmatization on German text?

I see following problems. My data is structured in sentences and not single words. In my case spacy lemmatization doesn't seem to work...

python - Stemming/lemmatization for German words

I have a huge dataset of German words and their frequency in a text corpus (so words like "der", "die", "das" have a...

A Self-Learning Context-Aware Lemmatizer for German

The lemmatization algorithm considers the con- text and grammatical features of the language to lemmatize German words. It requires an additional. POS tagger ......