New rule: Require regex character sets ranges to be obvious (A-Z, not A-z)
See original GitHub issuePlease describe what the rule should do:
When seeing a range in a regular expression (for example: /[a-z]/
) it should require they are subsets of one of the following ranges:
a-z
A-Z
0-9
This prevents mistakes like A-z
that will work, but match too much. For instance, /[A-z]/.test("^")
returns true
, which may be unexpected. Additionally, it warns about unintended uses of range operator, for instance when using a regular expression to parse grammar (-
needs to be specifically escaped).
This rule shouldn’t work on unicode/hex escapes, as using those shows that the string may actually store binary data. It also shouldn’t work on characters outside of ASCII range, as there are legitimate use of those (for instance, matching codepoint blocks like Arabic), and it would require thinking about what is reasonable as far this rule is concerned - it can be extended later if needed. Also, as I mentioned, it requires a subset, so code like /[d-g]/
is perfectly fine.
What category of rule is this? (place an “X” next to just one item)
[ ] Enforces code style [X] Warns about a potential error [ ] Suggests an alternate way of doing something [ ] Other (please specify:)
Provide 2-3 code examples that this rule will warn about:
const OPERATOR_REGEX = /[+-\/*]/
const word = "Hello^world"
if (word.test(/^[A-z]+$/)) {
alert("An identifier")
}
Why should this rule be included in ESLint (instead of a plugin)?
Writing regular expression is hard. Ranges like [A-z]
are quietly wrong, even if they appear to work. It’s likely to forget to escape -
in ranges, and get confused why the code doesn’t work (happened a lot back when I was learning regular expressions, I do admit).
This shouldn’t be a plugin, as likely a plugin to check this wouldn’t be used as it wouldn’t be known by users of ESLint, while this is a rather serious error which may be confusing to debug. This rule is very unlikely to be a false positive, and in event it turns out to be, it’s easy to use hex escapes. Regular expressions are part of a language, not an external library.
There is prior art for this lint in Perl.
I did to propose this rule after realizing that no-useless-escape
complains about \-
in character ranges, when -
is at beginning or end of regex. I think no-useless-escape
lint rule is fine, as long it is supported by another rule finding mistakes with regex ranges, as encouraging to not write \-
makes such mistakes more likely
Issue Analytics
- State:
- Created 6 years ago
- Reactions:4
- Comments:7 (4 by maintainers)
Top GitHub Comments
Yes, allowing ranges does appear to be the implication of “in event it turns out to be, it’s easy to use hex escapes”
No, it was not. We define consensus as having three 👍s from team members, as well as a team member willing to champion the proposal. This is a high bar by design – we can’t realistically accept and maintain every feature request in the long term, so we only accept feature requests which are useful enough that there is consensus among the team that they’re worth adding.
Since ESLint is pluggable and can load custom rules at runtime, the lack of consensus among the ESLint team doesn’t need to be a blocker for you using a rule like this in your project, if you’d find it useful. It just means that you would need to implement the rule yourself, rather than using a bundled rule that is packaged with ESLint.