Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

HunspellStemFilter does not work with zero affix

See original GitHub issue

Hello!

Our hunspell aff file contains some rules for suffixes and prefixes that have in affix 0 value. For example: aff file

SET UTF-8
TRY esianrtolcdugmphbyfvkwzESIANRTOLCDUGMPHBYFVKWZ’

FLAG long

SFX AA Y 1
SFX AA er 0 er

dic file

1
worker/AA

So if the work word is analyzed then the worker word must be returned. But returns work - the rule does not apply.

The reason is the value of the affix 0, because changing the rule fixes the analysis:

SFX AA ker k er

I found in code a todo about zero-affix handling fix. But I don’t understand if this todo relates to my problem or it is another one problem.

If this the same problem could you tell me there is a plan to fix it soon?

Issue Analytics

State:
Created 3 years ago
Comments:6 (6 by maintainers)

Top GitHub Comments

1reaction

NightOwl888commented, Feb 22, 2021

We are releasing 4.8.0, which is not the most stable 4.x version. Hunspell was buggy until at least 4.10, primarily it seems because there is no offficial spec for it so it took some exploration to make it work. So, we have 3 choices.

Strictly follow Lucene 4.8.0 including all of its bugs.
Patch Lucene 4.8.0 in some regards to make it less buggy.
Plan to release 4.10.4 before moving on to 8.x.

IMO, since we have a long road to get to 8.x we should leave stable software in our wake. So, that either means patching some parts of 4.8.0 to eliminate known bugs or doing a full upgrade of the whole project to 4.10.4 before moving on to 8.x. Clearly, of the two options the patch is simpler.

1reaction

rclabocommented, Feb 16, 2021

I’ll do this one. It can be assigned to me.

Top Results From Across the Web

Lucene.Net.Analysis.Hunspell.HunspellStemFilter Class ...

Creates a new HunspellStemFilter that will stem tokens from the given TokenStream using affix rules in the provided HunspellDictionary.

Hunspell very high memory use when loading dictionary ...

LUCENE-5468: HunspellStemFilter uses 10 to 100x less RAM. It also loads ... This may not play well with some affix rule dictionaries.

Unanswered 'hunspell' Questions

When I used HunspellStemFilter for stemming the czech language text, it returns me bad results. For example word "praha" returns "praha" and "prahnout",...

Stemming german like a pro - shopping24 developer site

The Hunspell stemmer needs a current german Hunspell dictionary and affix definition. Those can be found on the Apache Open Office website as...

This Week in Elasticsearch and Apache Lucene - 2016-07-18

A create-index request (and similar) will now wait until an index is writable before returning. No more waiting for yellow.