question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

HunspellStemFilter does not work with zero affix

See original GitHub issue

Hello!

Our hunspell aff file contains some rules for suffixes and prefixes that have in affix 0 value. For example: aff file

SET UTF-8
TRY esianrtolcdugmphbyfvkwzESIANRTOLCDUGMPHBYFVKWZ’

FLAG long

SFX AA Y 1
SFX AA er 0 er

dic file

1
worker/AA

So if the work word is analyzed then the worker word must be returned. But returns work - the rule does not apply.

The reason is the value of the affix 0, because changing the rule fixes the analysis:

SFX AA ker k er

I found in code a todo about zero-affix handling fix. But I don’t understand if this todo relates to my problem or it is another one problem.

If this the same problem could you tell me there is a plan to fix it soon?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
NightOwl888commented, Feb 22, 2021

We are releasing 4.8.0, which is not the most stable 4.x version. Hunspell was buggy until at least 4.10, primarily it seems because there is no offficial spec for it so it took some exploration to make it work. So, we have 3 choices.

  1. Strictly follow Lucene 4.8.0 including all of its bugs.
  2. Patch Lucene 4.8.0 in some regards to make it less buggy.
  3. Plan to release 4.10.4 before moving on to 8.x.

IMO, since we have a long road to get to 8.x we should leave stable software in our wake. So, that either means patching some parts of 4.8.0 to eliminate known bugs or doing a full upgrade of the whole project to 4.10.4 before moving on to 8.x. Clearly, of the two options the patch is simpler.

1reaction
rclabocommented, Feb 16, 2021

I’ll do this one. It can be assigned to me.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Lucene.Net.Analysis.Hunspell.HunspellStemFilter Class ...
Creates a new HunspellStemFilter that will stem tokens from the given TokenStream using affix rules in the provided HunspellDictionary.
Read more >
Hunspell very high memory use when loading dictionary ...
LUCENE-5468: HunspellStemFilter uses 10 to 100x less RAM. It also loads ... This may not play well with some affix rule dictionaries.
Read more >
Unanswered 'hunspell' Questions
When I used HunspellStemFilter for stemming the czech language text, it returns me bad results. For example word "praha" returns "praha" and "prahnout",...
Read more >
Stemming german like a pro - shopping24 developer site
The Hunspell stemmer needs a current german Hunspell dictionary and affix definition. Those can be found on the Apache Open Office website as...
Read more >
This Week in Elasticsearch and Apache Lucene - 2016-07-18
A create-index request (and similar) will now wait until an index is writable before returning. No more waiting for yellow.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found