question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Hunspell affixes' file parsing corrupts some affixes' conditions

See original GitHub issue

Hello!

Our hunspell aff file contains some rules for suffixes and prefixes that corrupt at the parsing. For example:

SFX AA a b [c]d

The condition [c]d is a correct regex expression but this rule does not work. The prblem is found in the hunspell dictionary class in the ParseAffix method (place in code).

We can see that if affix’s condition starts with [ and not ends with ] then the ] adds at the end of condition. So the condition become incorrect regex expression [c]d].

Maybe there the !condition.EndsWith("]", StringComparison.Ordinal) check should be replaced with the condition.IndexOf("]", StringComparison.Ordinal) == -1 check. But I don’t know about reasons of current version code. So I can be wrong.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
NightOwl888commented, Feb 21, 2021

Great. I ran the tests in Lucene 4.8.0 and they fail, however in Lucene 8.2.0, they succeed. I checked the source, at least as far back as Lucene 4.10.4 the line in question was changed to use IndexOf just as you have suggested.

So, if we upgrade Lucene.Net.Hunspell to 4.10.4 as I proposed in #419, we will patch this problem automatically.

1reaction
SergeyKotyushkincommented, Feb 13, 2021

Thanks. The test looks good, but it seems you didn’t attach the condition-issue-418.aff and .condition-issue-418.dic files.

Yes, I didn’t attach them but I wrote their content right in my comment.

Read more comments on GitHub >

github_iconTop Results From Across the Web

contrib/Analyzers/Hunspell/HunspellDictionary.cs Source File
91 /// Looks up HunspellAffix prefixes that have an append that matches the String created from the given char array, offset and length....
Read more >
format of Hunspell dictionaries and affix files
Hunspell (1) Hunspell requires two files to define the way a language is being spell checked: a dictionary file containing words and applicable...
Read more >
src/hunspell/affixmgr.cxx
#endif. // step one is to parse the affix file building up the internal. // affix data structures. // read in each line...
Read more >
jiaofl/hunspell
condition checking of affix rules instead of the Dömölki-algorithm: - Unlimited condition length (instead of max. 8 characters). - Less memory consumption, ...
Read more >
meaning of suffix in hunspell files i.e. `/EPSozm`
So, we're looking at the dictionary file, and we see some flags, but what they mean is undefined without looking at the affix...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found