leammtizer issue for german words
See original GitHub issueI am confused about the lemmatizer. For a sentence Ich sehe Bäume
(I see trees).
nlp = spacy.load('de_core_news_sm')
doc = nlp(u'Ich sehe Bäume')
for token in doc:
print(token.text,token.lemma, token.lemma_, token.pos_)
print("has_vector:", token.has_vector)
token.lemma is just Bäume
. I thought it would be lemmatized to the singular form Baum
(tree)?
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Problems and errors in German lemmatizer · Issue #2486
Hi all,. I hope the code snippet exemplifies the problem clearly enough. Basically, I fail to see how the German lemmatization should be...
Read more >How to Lemmatize German Words with NLP-Spacy ...
Lemmatizer tools can analyze the types of word changes in the German language. Thus, this paper aims at investigating how the lemmatization of...
Read more >Ho to do lemmatization on German text?
I see following problems. My data is structured in sentences and not single words. In my case spacy lemmatization doesn't seem to work...
Read more >python - Stemming/lemmatization for German words
I have a huge dataset of German words and their frequency in a text corpus (so words like "der", "die", "das" have a...
Read more >A Self-Learning Context-Aware Lemmatizer for German
The lemmatization algorithm considers the con- text and grammatical features of the language to lemmatize German words. It requires an additional. POS tagger ......
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Yes,
Baum
would definitely be correct here. The German lemmatizer only uses lookup tables (and no rule-based process like the English one). This has some limitations – I’ve written a bit more about this in my comment on this thread.Another problem is that spaCy will always decide on one lemma (and won’t just give you a bunch of options to choose from). This is convenient – but it also means that if the one pick has to be correct. That said, there’s definitely been some suspicious reports around the lemmatization performance that might indicate a bug.
In the meantime, you might want to check out the
spacy-iwnlp
extensions by @Liebeck and see how it performs on your use case!This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.