question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cyrillic & latin symbols same time search

See original GitHub issue

According to https://github.com/nextapps-de/flexsearch/issues/51 for search by cyrillic symbols we can use bellow options

{
    encode: false,
    split: /\s+/,
    tokenize: "reverse"
}

But this options breaks searching by latin symbols.

For example search by бренд Microsoft (brand Microsoft in english) doesn’t work

How I can fix it?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:9 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
gmfmicommented, Jul 7, 2020

Hi @japanes,

I got the same issue and I finally decided to look into the source code to find a solution that do not require to recompile the code.

Basically, we would like to use the simple encoder (instead of setting it to false) but it removes any characters other than latin letters, numbers and spaces. I twisted a little bit the regexp patterns and now everything works. You can search both in latin and cyrillic!

There is probably a prettier way to do it but this should work for you:

{
    split: /\s+/,
    tokenize: "reverse",
    encode: function(str) {
        var regexp_replacements = {
            "a": /[àáâãäå]/g,
            "e": /[èéêë]/g,
            "i": /[ìíîï]/g,
            "o": /[òóôõöő]/g,
            "u": /[ùúûüű]/g,
            "y": /[ýŷÿ]/g,
            "n": /ñ/g,
            "c": /[ç]/g,
            "s": /ß/g,
            " ": /[-/]/g,
            "": /['!"#$%&\\()\*+,-./:;<=>?@[\]^_`{|}~]/g,
            " ": /\s+/g,
        }
        str = str.toLowerCase();
        for (var key of Object.keys(regexp_replacements)) {
            str = str.replace(regexp_replacements[key], key);
        }
        return str === " " ? "" : str;
    }
}

Hoping that can help you 😉

1reaction
angeloashmorecommented, Apr 15, 2021

@mryodo I realize I’m replying almost a year later 😅, but you can pass a FlexSearch instance directly to useFlexSearch. By doing that, you shouldn’t need to have a hard-coded edited version of the hook in your project.

const importedIndex = FlexSearch.create({
  split: /\s+/,
  tokenize: "reverse",
  encode: function (str) {
    var regexp_replacements = {
      a: /[àáâãäå]/g,
      e: /[èéêë]/g,
      i: /[ìíîï]/g,
      o: /[òóôõöő]/g,
      u: /[ùúûüű]/g,
      y: /[ýŷÿ]/g,
      n: /ñ/g,
      c: /[ç]/g,
      s: /ß/g,
      " ": /[-/]/g,
      "": /['!"#$%&\\()\*+,-./:;<=>?@[\]^_`{|}~]/g,
      " ": /\s+/g,
    }
    str = str.toLowerCase()
    for (var key of Object.keys(regexp_replacements)) {
      str = str.replace(regexp_replacements[key], key)
    }
    return str === " " ? "" : str
  },
})

importedIndex.import(yourExistingIndex)

const results = useFlexSearch(query, importedIndex)

Ideally you can instantiate the index outside your React component and only call importedIndex.import once in within your component for better performance.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Russian Conversion: Cyrillic <> Latin Alphabet • LEXILOGOS
Online converter to convert a Russian text: Cyrillic-Latin alphabet. ... The characters я and ю are transcribed â and û (ja and ju...
Read more >
Get cyrillic result from latin characters - sql - Stack Overflow
I have a database in sql server and I am trying to search for some results using the LIKE function. Example: Column1 "abc"...
Read more >
Search for Foreign Language Characters in Text
To find just the latin characters, ignoring any special characters, ... all the non-English characters, such as accents, ornaments, cyrillic, and so on....
Read more >
Translit RU/EN: Russian Translit, Transliteration and Virtual ...
How the Russian Translit Converter Works? Conversion from Latin script to Russian Cyrillic letters is performed in real time as you type. Advantage...
Read more >
Why do some Latin and Cyrillic letters which look the same ...
It is for historical and practical reasons. There was a time when the Cyrillic script used to look quite different from the Latin...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found