question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Inconsistent representation of the backslash character in search patterns

See original GitHub issue

Describe the bug

In brief: Inconsistent representation of the backslash character in search patterns means that some CRS rules behave subtly differently depending on the platform in use.

Following on from #2140, a search of the entire Core Rule Set (v3.4/dev branch) has revealed a handful of instances of the pattern \\ being used to represent a single backslash character.

The pattern \\ works correctly with libmodsecurity however Apache requires the pattern \\\\ in order to correctly represent a single backslash character (due to slight differences in how rules are parsed, as explained in detail here).

Most CRS rules use a portable representation of a backslash, e.g. [\\\\], which works as intended with both libmodsecurity and Apache. Some rules only use \\, however. This inconsistency means that some CRS rules behave subtly differently depending on the platform in use.

Steps to reproduce

Test one of the rules highlighted below on both Apache/mod_security2 and nginx/libmodsecurity.

Test using a pattern containing a \ character that should match.

Observe that the rule may be triggered on one platform but not the other, e.g. the rule may match with nginx but not match with Apache.


Example

Attempting to trigger rule 933210 with the pattern (sys\tem)('uname');

Command: curl -o /dev/null -v "localhost:80/?test=(sys\tem)('uname');"

Test 1 Testing against Apache/mod_security2: < HTTP/1.1 200 OK (No rules triggered)

Test 2 Testing against nginx/libmodsecurity: < HTTP/1.1 403 Forbidden

…ModSecurity: Warning. Matched "Operator `Rx'…against variable `ARGS:test' (Value: `(sys\tem)('uname');' ) …[id "933210"]…

Expected behaviour

All CRS rules are expected to perform similarly on either Apache/mod_security2 or libmodsecurity.

Actual behaviour

Rules using the pattern \\ behave differently depending on the platform.

Additional context

This is much clearer when run on a terminal with colour highlighting:

$ grep -E '[^\\]\\\\[^\\]' rules/*
rules/php-errors.data:Cannot access property started with '\\0'
rules/REQUEST-930-APPLICATION-ATTACK-LFI.conf:SecRule REQUEST_URI|ARGS|REQUEST_HEADERS|!REQUEST_HEADERS:Referer|XML:/* "@rx (?:(?:^|[\\/])\.\.[\\/]|[\\/]\.\.(?:[\\/]|$))" \
rules/REQUEST-932-APPLICATION-ATTACK-RCE.conf:# [\^\.\w '\"/\\\\]*\\\\)?[\"\^]*       \\net\share\dir\cmd
rules/REQUEST-932-APPLICATION-ATTACK-RCE.conf:SecRule REQUEST_COOKIES|!REQUEST_COOKIES:/__utm/|REQUEST_COOKIES_NAMES|ARGS_NAMES|ARGS|XML:/* "@rx (?:[*?`\\'][^/\n]+/|\$[({\[#a-zA-Z0-9]|/[^/]+?[*?`\\'])" \
rules/REQUEST-933-APPLICATION-ATTACK-PHP.conf:SecRule REQUEST_COOKIES|!REQUEST_COOKIES:/__utm/|REQUEST_COOKIES_NAMES|REQUEST_FILENAME|ARGS_NAMES|ARGS|XML:/* "@rx (?:(?:\(|\[|\")[a-zA-Z0-9_.$\"'\[\](){}*\s\\]+(?:\)|\]|\")[0-9_.$\"'\[\](){}*\s]*\([a-zA-Z0-9_.$\"'\[\](){}*\s].*\)|\([\s]*string[\s]*\)[\s]*(?:\"|'))\s*[;]" \
rules/REQUEST-941-APPLICATION-ATTACK-XSS.conf:SecRule REQUEST_COOKIES|!REQUEST_COOKIES:/__utm/|REQUEST_COOKIES_NAMES|REQUEST_HEADERS:User-Agent|REQUEST_HEADERS:Referer|ARGS_NAMES|ARGS|XML:/* "@rx (?i)(?:\W|^)(?:javascript:(?:[\s\S]+[=\\\(\[\.<]|[\s\S]*?(?:\bname\b|\\[ux]\d))|data:(?:(?:[a-z]\w+/\w[\w+-]+\w)?[;,]|[\s\S]*?;[\s\S]*?\b(?:base64|charset=)|[\s\S]*?,[\s\S]*?<[\s\S]*?\w[\s\S]*?>))|@\W*?i\W*?m\W*?p\W*?o\W*?r\W*?t\W*?(?:/\*[\s\S]*?)?(?:[\"']|\W*?u\W*?r\W*?l[\s\S]*?\()|\W*?-\W*?m\W*?o\W*?z\W*?-\W*?b\W*?i\W*?n\W*?d\W*?i\W*?n\W*?g[\s\S]*?:[\s\S]*?\W*?u\W*?r\W*?l[\s\S]*?\(" \
rules/REQUEST-942-APPLICATION-ATTACK-SQLI.conf:SecRule REQUEST_COOKIES|!REQUEST_COOKIES:/__utm/|!REQUEST_COOKIES:/_pk_ref/|REQUEST_COOKIES_NAMES|ARGS_NAMES|ARGS|XML:/* "@rx (?:/\*!?|\*/|[';]--|--[\s\r\n\v\f]|--[^-]*?-|[^&-]#.*?[\s\r\n\v\f]|;?\\x00)" \

The \\net\share\dir\cmd line is irrelevant, as it is a comment. The remaining lines appear to be valid issues (and easy fixes, I believe).

So, affected are the following:

  • ~The php-errors.data file (which is referenced only by (response) rule 953100, PL1)~
  • Rule 930110, PL1 (already being handled in #2140)
  • Rule 932200, PL2
  • Rule 933210, PL1
  • Rule 941170, PL1
  • Rule 942440, PL2

Your Environment

  • CRS version (e.g., v3.2.0): v3.4 (dev branch)
  • Paranoia level setting: Any
  • ModSecurity version (e.g., 2.9.3): 2.9.4 with Apache, 3.0.5 with nginx
  • Web Server and version (e.g., apache 2.4.41): Apache 2.4, nginx 1.20
  • Operating System and version: Whatever flavour of Linux the modsecurity-crs-docker containers use

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:2
  • Comments:19 (19 by maintainers)

github_iconTop GitHub Comments

2reactions
RedXanaducommented, Sep 1, 2021

Okay, now I understand what you mean 😃

I agree with you: adding [\\\\] is incorrect. I originally thought the rule was looking for the string representation \x00 and not a real NUL byte.


If the rule is looking for a NUL byte, should the end of the pattern be: …|;?\x00) ?

I’ve tested how the pattern from rule 942440 is interpreted using the tools pcre4msc2 and pcre4msc3. Because of the differences with escaping, the result is different:

$ echo | ./src/pcre4msc2 -d regexes/942440_1.txt 

RAW pattern:
============
(?:/\*!?|\*/|[';]--|--[\s\r\n\v\f]|--[^-]*?-|[^&-]#.*?[\s\r\n\v\f]|;?\\x00)

ESCAPED pattern:
================
(?:/\*!?|\*/|[';]--|--[\s\r\n\v\f]|--[^-]*?-|[^&-]#.*?[\s\r\n\v\f]|;?\x00)
$ echo | ./src/pcre4msc3 -d regexes/942440_1.txt 

PATTERN:
========
(?:/\*!?|\*/|[';]--|--[\s\r\n\v\f]|--[^-]*?-|[^&-]#.*?[\s\r\n\v\f]|;?\\x00)

ModSecurity v3 leaves the pattern with the extra \.


If the pattern is modified to …|;?\x00) then it ends up being the same:

$ echo | ./src/pcre4msc2 -d 942440_1_modified.txt 

RAW pattern:
============
(?:/\*!?|\*/|[';]--|--[\s\r\n\v\f]|--[^-]*?-|[^&-]#.*?[\s\r\n\v\f]|;?\x00)

ESCAPED pattern:
================
(?:/\*!?|\*/|[';]--|--[\s\r\n\v\f]|--[^-]*?-|[^&-]#.*?[\s\r\n\v\f]|;?\x00)
$ echo | ./src/pcre4msc3 -d 942440_1_modified.txt 

PATTERN:
========
(?:/\*!?|\*/|[';]--|--[\s\r\n\v\f]|--[^-]*?-|[^&-]#.*?[\s\r\n\v\f]|;?\x00)

So, that modified pattern would work the same on both platforms, I think.

I’m not sure how to test it for real: I’ve struggled to submit a test request containing a NUL byte using cURL

1reaction
RedXanaducommented, Aug 26, 2021

There is also the question of patterns designed to work with Apache, which do not work with libmodsecurity, e.g. from rule 920460:

(?:^|[^\\\\])\\\\[cdeghijklmpqwxyz123456789]

Actually, I wonder if this should be split into separate issues for wider discussion, as it may be too much to put into one issue?

(I’ve reduced the scope of this issue to just look at rule 941170 and rule 942440 for now, to make it less messy.)

@fzipi I couldn’t get the null character \x00 test to work, either for Apache or libmodsecurity (I forget which one didn’t work.) I’ll re-test and post a result here when I can.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Seclang parser issue: wrong '\\' sequence handling #2148
In v2, if a SecRule contains a sequence of \\ (double backslash), the parser interprets it as a single \ (backslash) character.
Read more >
6.2. re — Regular expression operations
Regular expressions use the backslash character ( '\' ) to indicate special forms or to allow special characters to be used without invoking...
Read more >
Can't escape the backslash with regex? - Stack Overflow
The usual metacharacters are normal characters inside a character class, and do not need to be escaped by a backslash. To search for...
Read more >
14 Strings | R for Data Science
A character class containing a single character is a nice alternative to backslash escapes when you want to include a single metacharacter in...
Read more >
Non-Printable Characters - Regex Tutorial
The letter after the backslash is always a lowercase c. The second letter is an uppercase letter A through Z, to indicate Control+A...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found