question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

problem with search in russian with capital letters

See original GitHub issue

Search in Russian language is case-sensitive (unlike English), but search string normalize query to lower-case. So, if i want to find word like Привет in Привет, Мир!, i’ll get an empty result. Reproduced on fresh install self-hosted, ubuntu 20.04, last zulip version in https://www.zulip.org/dist/releases/zulip-server-latest.tar.gz - 3.2, Install zulip with this man: https://zulip.readthedocs.io/en/latest/production/install.html Enabling multi language search: https://zulip.readthedocs.io/en/latest/subsystems/full-text-search.html

Peek 2020-10-23 11-13

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
koucommented, Dec 22, 2020

Ah, we also need to specify the normalizer in the highlight function we used: https://pgroonga.github.io/reference/functions/pgroonga-match-positions-character.html

It doesn’t have a feature to customize normalizer. I’ll improve PGroonga and release a new version. Then we can add support for highlighting Russian texts.

0reactions
koucommented, Dec 28, 2020

I’ve implemented it and released PGroonga 2.2.8.

The following change will fix it:

diff --git a/zerver/views/message_fetch.py b/zerver/views/message_fetch.py
index 2b6a0a6894..9a2d519aef 100644
--- a/zerver/views/message_fetch.py
+++ b/zerver/views/message_fetch.py
@@ -471,10 +471,13 @@ class NarrowBuilder:
         query_extract_keywords = func.pgroonga_query_extract_keywords
         operand_escaped = func.escape_html(operand)
         keywords = query_extract_keywords(operand_escaped)
+        index_name = "zerver_message_search_pgroonga"
         query = query.column(match_positions_character(column("rendered_content", Text),
-                                                       keywords).label("content_matches"))
+                                                       keywords,
+                                                       index_name).label("content_matches"))
         query = query.column(match_positions_character(func.escape_html(topic_column_sa()),
-                                                       keywords).label("topic_matches"))
+                                                       keywords,
+                                                       index_name).label("topic_matches"))
         condition = column("search_pgroonga").op("&@~")(operand_escaped)
         return query.where(maybe_negate(condition))

But we also need to run ALTER EXTENSION pgroonga UPDATE; as superuser after we upgrade installed PGroonga. But we don’t have the feature in our Puppet configuration yet…

Read more comments on GitHub >

github_iconTop Results From Across the Web

Problems searching for capital Russian letters in UTF-16BE
No, it wasn't. Try shortening the string in the test file to "е" and you will see it does work (search for "е")....
Read more >
Why is Russian only written with capital letters? - Quora
It is definitely not! It is just easier to find the graphic similarities between Cyrillic and Latin alphabets when you present Russian with...
Read more >
Russian language: uppercase letters as input to Safari
In Safari on my iPhone 6 (iOS 9.1) when I enter the input text, it entered into caps, but Caps Lock not clicked...
Read more >
'I never write putin and russia with a capital letter' – EURACTIV ...
Their level of crimes and atrocities cannot be hidden or forgiven. It is to show the true nature of Putin's Russia, which is...
Read more >
When is it necessary to use a capital letter in the beginning of ...
For example, names of languages and nations and are not capitalized in Russian. I think it might have to do something with most...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found