question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cyrillic languages support

See original GitHub issue

Hello,

I’ve faced with the following behaviour.

This example works as expected:

const FlexSearch = require('flexsearch');
const index = new FlexSearch();

index.add(1, 'Foobar')
console.log(index.search('Foobar'));
// [ 1 ]

But this one shows no results.

const FlexSearch = require('flexsearch');
const index = new FlexSearch();

index.add(1, 'Фообар')
console.log(index.search('Фообар'));
// []

I’ve tested in node and in browser.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:11 (5 by maintainers)

github_iconTop GitHub Comments

4reactions
gmfmicommented, May 18, 2020

Hi @ts-thomas, last year you said:

The next main release will get an improvement of handling all language-specific features.

As FlexSearch is currently in 0.7.0 release candidate, I was wondering if there is some new feature about languages processing.

I recently created a project called SearchinGhost, it is an in-browser search plugins for Ghost CMS powered by FlexSearch. I am really happy with FlexSearch (thank your for the work done!) but I would like to bring multi-lang capabilities. For now, with what I read in the issues, I came up with these language default options:

Latin:

FlexSearch.create({
    encode: "simple",
    tokenize: "forward"
});

Arabic

FlexSearch.create({
    encode: false,
    rtl: true,
    split: /\s+/,
    tokenize: "forward"
});

Cyrilic, indian (any word separated by space language)

FlexSearch.create({
    encode: false,
    split: /\s+/,
    tokenize: "forward"
});

Chinese, Japanese or Korean (with dedicated chars w/o spaces)

FlexSearch.create({
    encode: false,
    tokenize: function(str){
        return str.replace(/[\x00-\x7F]/g, "").split("");
    }
});

Do you think there is any possible improvement/optimisation?

EDIT: I finally found this relevent documentation about the v0.7.0 - https://github.com/nextapps-de/flexsearch/blob/0.7.0/doc/0.7.0.md. Hope this version will be out one day 😃

3reactions
ts-thomascommented, May 31, 2019

Also take into account to use the “rtl” option for right-to-left support:

var index = FlexSearch.create({
    encode: false,
    rtl: true,
    split: /\s+/,
    tokenize: "forward"
});
Read more comments on GitHub >

github_iconTop Results From Across the Web

Cyrillic script - Wikipedia
Cyrillic script · Belarus · Bulgaria · Kazakhstan · Kyrgyzstan · North Macedonia · Russia · Serbia · Tajikistan ...
Read more >
Languages That Use the Cyrillic Alphabet - WorldAtlas
Currently, Cyrillic is in use by more than 50 languages, including Russian, Ukrainian, Serbian, Kazakh, Turkmen, and many more. The Cyrillic ...
Read more >
Languages and alphabets - Kofax Product Documentation
The following languages are written with the Cyrillic alphabet: Russian, Bulgarian, Byelorussian, Chechen, Kabardian, Macedonian, Moldavian, Serbian and ...
Read more >
Cyrillic Alphabet: Letters & Languages | What is Cyrillic Script?
The Cyrillic alphabet was developed in the 9th century to translate texts from Greek to various Slavic languages. The Cyrillic alphabet was ...
Read more >
Cyrillic script - Omniglot
The Cyrillic alphabet has been adapted to write more than 120 different languages, mainly in Russia, Central Asia and Eastern Europe.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found