question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Feature request: Treat the Chinese text as a Chinese sequence when using`Ctrl+Left/Right`

See original GitHub issue

Now the VSCode treats a long Chinese text as one “word”. Each time use Ctrl+Left/Right, it will move the cursor to the begin or end.

The feature request is that treat the Chinese text as a Chinese sequence, then each Ctrl+Left/Right, it just move one step. This act is the system text program default.

Example: (use | as the cursor )

|本文的学习公式
// Ctrl+Right
本文的学习公式|

Expected:

|本文的学习公式
// Ctrl+Right
本|文的学习公式
// Ctrl+Right
本文|的学习公式
// Ctrl+Right
本文的|学习公式

(Of course, It would be better if it can support Word Segmentation.)

Issue Analytics

  • State:open
  • Created 5 years ago
  • Reactions:18
  • Comments:10 (3 by maintainers)

github_iconTop GitHub Comments

7reactions
smikitkycommented, Jan 15, 2019

This is a longstanding problem which virtually all East-Asian developers will notice once they start editing natural sentences (say, in Markdown) on vscode. I think this is fundamentally a problem of wrong word-splitting for CJK languages (and perhaps Thai, too), which use no spaces to delimit words. A similar problem happens when you double-click a word in a line (the whole line will be selected instead of the target word) and when you trigger an autocompletion using <kbd>Ctrl</kbd>+<kbd>Space</kbd> (a whole line will be shown as a candidate).

Ideally, dictionary-based word segmentation is desirable (this is available on MS Word, Google Chrome browser, etc), but it’s not 100% correct, and I’m not sure if it is really necessary for a code editor. Another practical approach that works at least in Japanese is to split words based on character types, because a typical Japanese text is a mixture of kanji, hiragana and katakana (This algorithm is implemented on most domestic text editors and even MS Notepad.exe). Character types can be easily determined via Unicode code points.

Example:

(1) 吾輩は猫である。名前はまだない。
(2) 吾輩|は|猫|で|ある|。|名前|は|まだ|ない|。
(3) 吾輩|は|猫|である|。|名前|はまだない|。

(1): Natural Japanese text with two sentences. is a Japanese period.; (2): Dictionary-based word boundaries (|), available on MS Word, Chrome, etc.; (3): Codepoint-based kana-kanji boundaries, available on Firefox, Notepad.exe, etc.

There is already a popular extension that does (3) above for Japanese text. Unfortunately, it works on <kbd>Ctrl</kbd>+ <kbd>←</kbd>/<kbd>→</kbd> but nowhere else. It does not work on double-clicks, <kbd>Ctrl</kbd>+<kbd>D</kbd>, autocompletion, text search, and so on.

Personally, I think (3) should be implemented as part of the basic functionality of VSCode, considering the fact that it’s available on any other decent text editors. Dictionary-based solution (2) may be too costly within the main vscode repository, but I hope there is a way to allow extension developers to override word-boundary detection algorithm or the double-click behavior.


By the way, for the meantime, you can alleviate this problem by tweaking "editor.wordSeparators" settings and adding multibyte punctuation marks such as . With this, you can stop the cursor at least at (double-byte) periods and commas using <kbd>Ctrl</kbd> + <kbd>←</kbd>/<kbd>→</kbd>

6reactions
rebornixcommented, Nov 19, 2019

Let’s see if we can have time for it during holiday time.

Read more comments on GitHub >

github_iconTop Results From Across the Web

No results found

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found