\x5c is single byte but seems to represent two slashes
See original GitHub issueThis is related to #2778 maybe as the escaping behavior seems to still be confusing.
For example in this rule
We have \x5c
- unquoting this string, we get the regular expression (?:^|[^\])\[cdeghijklmpqwxyz123456789])
. This is not a valid regular expression and presumably it is supposed to be (?:^|[^\\])\\[cdeghijklmpqwxyz123456789])
.
I can’t come up with any reasoning to grok this other than it seems like \x5c
is being treated as a magical control sequence to represent two slashes in a regular expression. Basically, there isn’t really a way to reconcile the fact that when converting the string into regular expression syntax, this control sequence represents two characters (\\
) while arbitrary bytes like in this one are each just one
Should the rules with \x5c
representing a backslash match in the regular expression all be \x5c\x5c
?
Issue Analytics
- State:
- Created a year ago
- Comments:16 (16 by maintainers)
Top GitHub Comments
@RedXanadu Yeah linked in the original message, e.g.
(?:^|[^\x5c])\x5c[cdeghijklmpqwxyz123456789]
fails because when unquoted it becomes(?:^|[^\])\[cdeghijklmpqwxyz123456789])
which isn’t valid in terms of regex syntax (\]
is a valid escape sequence in regex syntax so results in an unclosed bracket)Thanks for the mini-adventure @anuraaga 🙂