question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

\x5c is single byte but seems to represent two slashes

See original GitHub issue

This is related to #2778 maybe as the escaping behavior seems to still be confusing.

For example in this rule

https://github.com/coreruleset/coreruleset/blob/v4.0/dev/rules/REQUEST-920-PROTOCOL-ENFORCEMENT.conf#L1718

We have \x5c - unquoting this string, we get the regular expression (?:^|[^\])\[cdeghijklmpqwxyz123456789]). This is not a valid regular expression and presumably it is supposed to be (?:^|[^\\])\\[cdeghijklmpqwxyz123456789]).

I can’t come up with any reasoning to grok this other than it seems like \x5c is being treated as a magical control sequence to represent two slashes in a regular expression. Basically, there isn’t really a way to reconcile the fact that when converting the string into regular expression syntax, this control sequence represents two characters (\\) while arbitrary bytes like in this one are each just one

https://github.com/coreruleset/coreruleset/blob/v4.0/dev/rules/REQUEST-934-APPLICATION-ATTACK-GENERIC.conf#L241

Should the rules with \x5c representing a backslash match in the regular expression all be \x5c\x5c?

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:16 (16 by maintainers)

github_iconTop GitHub Comments

1reaction
anuraagacommented, Sep 26, 2022

@RedXanadu Yeah linked in the original message, e.g. (?:^|[^\x5c])\x5c[cdeghijklmpqwxyz123456789] fails because when unquoted it becomes (?:^|[^\])\[cdeghijklmpqwxyz123456789]) which isn’t valid in terms of regex syntax (\] is a valid escape sequence in regex syntax so results in an unclosed bracket)

0reactions
RedXanaducommented, Sep 27, 2022

Thanks for the mini-adventure @anuraaga 🙂

Read more comments on GitHub >

github_iconTop Results From Across the Web

C: Multiple backslashes used in char - Stack Overflow
@EugeneSh. In C, '\\\\' is a character-constant of type int using a sequence of multibyte characters. C does not call it a literal....
Read more >
read / write single backslashes · Issue #885 - GitHub
Yes, but if you ask writetable to write this in a text file, inside the file you get two characters, thus two bytes...
Read more >
stripslashes - Manual - PHP
Un-quotes a quoted string. stripslashes() can be used if you aren't inserting this data into a place (such as a database) that requires...
Read more >
f python? - Google Groups
The one byte mistake in DOS and Windows is recognizing two characters as path ... And if, instead, you want to represent all...
Read more >
Shift JIS - Wikipedia
Shift JIS is the second-most popular character encoding for Japanese websites, used by 5.6% of sites in the .jp domain. UTF-8 is used...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found