HunspellStemFilter does not work with zero affix
See original GitHub issueHello!
Our hunspell aff file contains some rules for suffixes and prefixes that have in affix 0 value. For example: aff file
SET UTF-8
TRY esianrtolcdugmphbyfvkwzESIANRTOLCDUGMPHBYFVKWZ’
FLAG long
SFX AA Y 1
SFX AA er 0 er
dic file
1
worker/AA
So if the work
word is analyzed then the worker
word must be returned.
But returns work
- the rule does not apply.
The reason is the value of the affix 0, because changing the rule fixes the analysis:
SFX AA ker k er
I found in code a todo about zero-affix handling fix. But I don’t understand if this todo relates to my problem or it is another one problem.
If this the same problem could you tell me there is a plan to fix it soon?
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
Lucene.Net.Analysis.Hunspell.HunspellStemFilter Class ...
Creates a new HunspellStemFilter that will stem tokens from the given TokenStream using affix rules in the provided HunspellDictionary.
Read more >Hunspell very high memory use when loading dictionary ...
LUCENE-5468: HunspellStemFilter uses 10 to 100x less RAM. It also loads ... This may not play well with some affix rule dictionaries.
Read more >Unanswered 'hunspell' Questions
When I used HunspellStemFilter for stemming the czech language text, it returns me bad results. For example word "praha" returns "praha" and "prahnout",...
Read more >Stemming german like a pro - shopping24 developer site
The Hunspell stemmer needs a current german Hunspell dictionary and affix definition. Those can be found on the Apache Open Office website as...
Read more >This Week in Elasticsearch and Apache Lucene - 2016-07-18
A create-index request (and similar) will now wait until an index is writable before returning. No more waiting for yellow.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
We are releasing 4.8.0, which is not the most stable 4.x version. Hunspell was buggy until at least 4.10, primarily it seems because there is no offficial spec for it so it took some exploration to make it work. So, we have 3 choices.
IMO, since we have a long road to get to 8.x we should leave stable software in our wake. So, that either means patching some parts of 4.8.0 to eliminate known bugs or doing a full upgrade of the whole project to 4.10.4 before moving on to 8.x. Clearly, of the two options the patch is simpler.
I’ll do this one. It can be assigned to me.