question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

leammtizer issue for german words

See original GitHub issue

I am confused about the lemmatizer. For a sentence Ich sehe Bäume (I see trees).

nlp = spacy.load('de_core_news_sm')
doc = nlp(u'Ich sehe Bäume')

for token in doc:
    print(token.text,token.lemma, token.lemma_, token.pos_)
    print("has_vector:", token.has_vector)

token.lemma is just Bäume. I thought it would be lemmatized to the singular form Baum (tree)?

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
inescommented, May 25, 2018

Yes, Baum would definitely be correct here. The German lemmatizer only uses lookup tables (and no rule-based process like the English one). This has some limitations – I’ve written a bit more about this in my comment on this thread.

Another problem is that spaCy will always decide on one lemma (and won’t just give you a bunch of options to choose from). This is convenient – but it also means that if the one pick has to be correct. That said, there’s definitely been some suspicious reports around the lemmatization performance that might indicate a bug.

In the meantime, you might want to check out the spacy-iwnlp extensions by @Liebeck and see how it performs on your use case!

0reactions
lock[bot]commented, Aug 5, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Problems and errors in German lemmatizer · Issue #2486
Hi all,. I hope the code snippet exemplifies the problem clearly enough. Basically, I fail to see how the German lemmatization should be...
Read more >
How to Lemmatize German Words with NLP-Spacy ...
Lemmatizer tools can analyze the types of word changes in the German language. Thus, this paper aims at investigating how the lemmatization of...
Read more >
Ho to do lemmatization on German text?
I see following problems. My data is structured in sentences and not single words. In my case spacy lemmatization doesn't seem to work...
Read more >
python - Stemming/lemmatization for German words
I have a huge dataset of German words and their frequency in a text corpus (so words like "der", "die", "das" have a...
Read more >
A Self-Learning Context-Aware Lemmatizer for German
The lemmatization algorithm considers the con- text and grammatical features of the language to lemmatize German words. It requires an additional. POS tagger ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found