Hunspell affixes' file parsing corrupts some affixes' conditions
See original GitHub issueHello!
Our hunspell aff file contains some rules for suffixes and prefixes that corrupt at the parsing. For example:
SFX AA a b [c]d
The condition [c]d
is a correct regex expression but this rule does not work.
The prblem is found in the hunspell dictionary class in the ParseAffix method (place in code).
We can see that if affix’s condition starts with [
and not ends with ]
then the ]
adds at the end of condition. So the condition become incorrect regex expression [c]d]
.
Maybe there the !condition.EndsWith("]", StringComparison.Ordinal)
check should be replaced with the condition.IndexOf("]", StringComparison.Ordinal) == -1
check. But I don’t know about reasons of current version code. So I can be wrong.
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
contrib/Analyzers/Hunspell/HunspellDictionary.cs Source File
91 /// Looks up HunspellAffix prefixes that have an append that matches the String created from the given char array, offset and length....
Read more >format of Hunspell dictionaries and affix files
Hunspell (1) Hunspell requires two files to define the way a language is being spell checked: a dictionary file containing words and applicable...
Read more >src/hunspell/affixmgr.cxx
#endif. // step one is to parse the affix file building up the internal. // affix data structures. // read in each line...
Read more >jiaofl/hunspell
condition checking of affix rules instead of the Dömölki-algorithm: - Unlimited condition length (instead of max. 8 characters). - Less memory consumption, ...
Read more >meaning of suffix in hunspell files i.e. `/EPSozm`
So, we're looking at the dictionary file, and we see some flags, but what they mean is undefined without looking at the affix...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Great. I ran the tests in Lucene 4.8.0 and they fail, however in Lucene 8.2.0, they succeed. I checked the source, at least as far back as Lucene 4.10.4 the line in question was changed to use
IndexOf
just as you have suggested.So, if we upgrade
Lucene.Net.Hunspell
to 4.10.4 as I proposed in #419, we will patch this problem automatically.Yes, I didn’t attach them but I wrote their content right in my comment.