Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Feature request: Treat the Chinese text as a Chinese sequence when using`Ctrl+Left/Right`

See original GitHub issue

Now the VSCode treats a long Chinese text as one “word”. Each time use Ctrl+Left/Right, it will move the cursor to the begin or end.

The feature request is that treat the Chinese text as a Chinese sequence, then each Ctrl+Left/Right, it just move one step. This act is the system text program default.

Example: (use | as the cursor )

|本文的学习公式
// Ctrl+Right
本文的学习公式|

Expected:

|本文的学习公式
// Ctrl+Right
本|文的学习公式
// Ctrl+Right
本文|的学习公式
// Ctrl+Right
本文的|学习公式

(Of course, It would be better if it can support Word Segmentation.)

Issue Analytics

State:
Created 5 years ago
Reactions:18
Comments:10 (3 by maintainers)

Top GitHub Comments

7reactions

smikitkycommented, Jan 15, 2019

This is a longstanding problem which virtually all East-Asian developers will notice once they start editing natural sentences (say, in Markdown) on vscode. I think this is fundamentally a problem of wrong word-splitting for CJK languages (and perhaps Thai, too), which use no spaces to delimit words. A similar problem happens when you double-click a word in a line (the whole line will be selected instead of the target word) and when you trigger an autocompletion using <kbd>Ctrl</kbd>+<kbd>Space</kbd> (a whole line will be shown as a candidate).

Ideally, dictionary-based word segmentation is desirable (this is available on MS Word, Google Chrome browser, etc), but it’s not 100% correct, and I’m not sure if it is really necessary for a code editor. Another practical approach that works at least in Japanese is to split words based on character types, because a typical Japanese text is a mixture of kanji, hiragana and katakana (This algorithm is implemented on most domestic text editors and even MS Notepad.exe). Character types can be easily determined via Unicode code points.

Example:

(1) 吾輩は猫である。名前はまだない。
(2) 吾輩|は|猫|で|ある|。|名前|は|まだ|ない|。
(3) 吾輩|は|猫|である|。|名前|はまだない|。

(1): Natural Japanese text with two sentences. 。 is a Japanese period.; (2): Dictionary-based word boundaries (|), available on MS Word, Chrome, etc.; (3): Codepoint-based kana-kanji boundaries, available on Firefox, Notepad.exe, etc.

There is already a popular extension that does (3) above for Japanese text. Unfortunately, it works on <kbd>Ctrl</kbd>+ <kbd>←</kbd>/<kbd>→</kbd> but nowhere else. It does not work on double-clicks, <kbd>Ctrl</kbd>+<kbd>D</kbd>, autocompletion, text search, and so on.

Personally, I think (3) should be implemented as part of the basic functionality of VSCode, considering the fact that it’s available on any other decent text editors. Dictionary-based solution (2) may be too costly within the main vscode repository, but I hope there is a way to allow extension developers to override word-boundary detection algorithm or the double-click behavior.

By the way, for the meantime, you can alleviate this problem by tweaking "editor.wordSeparators" settings and adding multibyte punctuation marks such as 。. With this, you can stop the cursor at least at (double-byte) periods and commas using <kbd>Ctrl</kbd> + <kbd>←</kbd>/<kbd>→</kbd>

6reactions

rebornixcommented, Nov 19, 2019

Let’s see if we can have time for it during holiday time.