question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

New rule: Require regex character sets ranges to be obvious (A-Z, not A-z)

See original GitHub issue

Please describe what the rule should do:

When seeing a range in a regular expression (for example: /[a-z]/) it should require they are subsets of one of the following ranges:

  • a-z
  • A-Z
  • 0-9

This prevents mistakes like A-z that will work, but match too much. For instance, /[A-z]/.test("^") returns true, which may be unexpected. Additionally, it warns about unintended uses of range operator, for instance when using a regular expression to parse grammar (- needs to be specifically escaped).

This rule shouldn’t work on unicode/hex escapes, as using those shows that the string may actually store binary data. It also shouldn’t work on characters outside of ASCII range, as there are legitimate use of those (for instance, matching codepoint blocks like Arabic), and it would require thinking about what is reasonable as far this rule is concerned - it can be extended later if needed. Also, as I mentioned, it requires a subset, so code like /[d-g]/ is perfectly fine.

What category of rule is this? (place an “X” next to just one item)

[ ] Enforces code style [X] Warns about a potential error [ ] Suggests an alternate way of doing something [ ] Other (please specify:)

Provide 2-3 code examples that this rule will warn about:

const OPERATOR_REGEX = /[+-\/*]/
const word = "Hello^world"
if (word.test(/^[A-z]+$/)) {
    alert("An identifier")
}

Why should this rule be included in ESLint (instead of a plugin)?

Writing regular expression is hard. Ranges like [A-z] are quietly wrong, even if they appear to work. It’s likely to forget to escape - in ranges, and get confused why the code doesn’t work (happened a lot back when I was learning regular expressions, I do admit).

This shouldn’t be a plugin, as likely a plugin to check this wouldn’t be used as it wouldn’t be known by users of ESLint, while this is a rather serious error which may be confusing to debug. This rule is very unlikely to be a false positive, and in event it turns out to be, it’s easy to use hex escapes. Regular expressions are part of a language, not an external library.

There is prior art for this lint in Perl.

I did to propose this rule after realizing that no-useless-escape complains about \- in character ranges, when - is at beginning or end of regex. I think no-useless-escape lint rule is fine, as long it is supported by another rule finding mistakes with regex ranges, as encouraging to not write \- makes such mistakes more likely

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:4
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
Zarelcommented, Jun 16, 2017

Yes, allowing ranges does appear to be the implication of “in event it turns out to be, it’s easy to use hex escapes”

0reactions
not-an-aardvarkcommented, Aug 14, 2017

Was your closing a mistake?

No, it was not. We define consensus as having three 👍s from team members, as well as a team member willing to champion the proposal. This is a high bar by design – we can’t realistically accept and maintain every feature request in the long term, so we only accept feature requests which are useful enough that there is consensus among the team that they’re worth adding.

Since ESLint is pluggable and can load custom rules at runtime, the lack of consensus among the ESLint team doesn’t need to be a blocker for you using a rule like this in your project, if you’d find it useful. It just means that you would need to implement the rule yourself, rather than using a bundled rule that is packaged with ESLint.

Read more comments on GitHub >

github_iconTop Results From Across the Web

regex - Why doesn't [01-12] range work as expected?
A character class, by itself, attempts to match one and exactly one ... In most flavors, [a-Z] is likely to be an illegal...
Read more >
Sets and ranges [...] - The Modern JavaScript Tutorial
Sets can be used in a regexp along with regular characters: ... For instance, [a-z] is a character in range from a to...
Read more >
Lexical Analysis With Flex, for Flex 2.6.3: Patterns
A negated character class such as the example ' [^A-Z] ' above will match a newline unless ' \n ' (or an equivalent...
Read more >
RPL Character Sets: Going beyond (around, through) regex
Our broken example shows another aspect of regex character sets: they may include ranges and individual characters mixed together. Indeed, [+-=* ...
Read more >
Regular expression syntax cheatsheet - JavaScript | MDN
Inside a character class, the dot loses its special meaning and matches a literal dot. Note that the m multiline flag doesn't change...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found