Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Bug: `key-spacing` with character encoded as two code units

See original GitHub issue

Environment

Node version: v18.2.0 npm version: 8.10.0 Local ESLint version: 8.16.0 Global ESLint version: none Operating System: Ubuntu 22.04 LTS

What parser are you using?

Default (Espree)

What did you do?

Configuration

{
    "parserOptions": {
        "ecmaVersion": "latest",
        "sourceType": "module"
    },
    "rules": {
        "key-spacing": [2, { "align": "value" }]
    }
}

const foo = {
    "a": "bar",
    "𐌘": "baz" // U+10318 https://en.wikipedia.org/wiki/Old_Italic_(Unicode_block)
};

Object.keys(foo).forEach((key) => {
    console.log(key, key.length, [...key].length);
});
// a 1 1
// 𐌘 2 1

What did you expect to happen?

No error reported.

What actually happened?

$ npx eslint index.js

/home/regseb/testcase/index.js
  2:10  error  Missing space before value for key 'a'  key-spacing

✖ 1 problem (1 error, 0 warnings)
  1 error and 0 warnings potentially fixable with the `--fix` option.

Participation

I am willing to submit a pull request for this issue.

Additional comments

JavaScript uses UTF-16 for String:

A character before U+FFFF is encoded as one code unit.
A character after U+FFFF is encoded as two code units.

String.length counts code units instead of characters.

I think we need to change the function getKeyWidth().

Issue Analytics

State:
Created a year ago
Comments:9 (9 by maintainers)

Top GitHub Comments

3reactions

nzakascommented, Jul 13, 2022

@regseb please open a separate issue for that.

grapheme-splitter seems like the right approach to me. Thanks @ljharb!

0reactions

mdjermanoviccommented, Jul 16, 2022

grapheme-splitter seems like the right approach to me.

Agreed, marking as accepted.

Top Results From Across the Web

Bug with two carets input for multibyte characters (`^^xx^^xx ...

It is not a bug, simply you are inputting the wrong characters. The input ^^d0^^9f. is valid but is the two character sequence...

Java charAt used with characters that have two code units

A character represented by surrogate pairs has two code units making up the character. sentence.charAt(0) would return the first code unit, ...

Applications Manager Issues Fixed - ManageEngine

Find the full list of feature enhancements and bug fixes that have gone into the recent releases of Applications Manager, the server and...

Troubleshooting character encoding - Vespa Documentation

UTF-8 is a Unicode specific encoding where each letter (code point) is encoded as one to ... Note how these two bugs create...

Ubuntu Manpage: tdom::schema - Creates a schema validation ...

The possible error codes are: MISSING_ELEMENT MISSING_TEXT UNEXPECTED_ELEMENT ... encoding, where each binary octet is a two-character hexadecimal number.