Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Refactor the CRS scoring variables

See original GitHub issue

The variables used to aggregate the anomaly scores are used very inconsistently (see also: https://github.com/coreruleset/coreruleset/issues/1896).

This becomes very apparent by comparing the rules 949110 and 959100 with respects to how anomaly_score and inbound_anomaly_score are used compared to anomaly_score and outbound_anomaly_score. The way the variables are named appear to be the complete inverse of how they are used. This is why the following lines of these rules are so different:

# 949110
SecRule TX:ANOMALY_SCORE "@ge %{tx.inbound_anomaly_score_threshold}" \
[...]
    setvar:'tx.inbound_anomaly_score=%{tx.anomaly_score}'"

# 959100
SecRule TX:OUTBOUND_ANOMALY_SCORE "@ge %{tx.outbound_anomaly_score_threshold}" \
[...]
    setvar:'tx.anomaly_score=+%{tx.outbound_anomaly_score}'"

For incoming scores, anomaly_score is used to aggregate the scores from the different paranoia levels and is then assigned to inbound_anomaly_score. For outgoing scores, outbound_anomaly_score is used to aggregate the scores from the different paranoia levels and is then added to anomaly_score.

Aside from the confusion there is no functional problem with this in itself. However, the confusion can lead to problems later on as the semantics of variables is not clear. This is the case with rule 980120 for example.

The variable INBOUND_ANOMALY_SCORE remains unset if anomaly_score is less than inbound_anomaly_score_threshold in phase 2. The result of this is that the log entry for this rule contains contradictory information if it is even printed:

Total Inbound Score: 0
individual paranoia level scores: 10, 0, 0, 0

[Thu Nov 25 14:30:48.269676 2021] [:error] [pid 344927:tid 140178457110272] [client 127.0.0.1:57402] [client 127.0.0.1] ModSecurity: Warning. Operator GT matched 1 at TX:executing_anomaly_score. [file "rules/RESPONSE-980-CORRELATION.conf"] [line "77"] [id "980120"] [msg "Inbound Anomaly Score (Total Inbound Score: 0 - SQLI=0,XSS=0,RFI=0,LFI=0,RCE=10,PHPI=0,HTTP=0,SESS=0): individual paranoia level scores: 10, 0, 0, 0"] [ver "OWASP_CRS/3.4.0-dev"] [tag "event-correlation"] [hostname "localhost"] [uri "/index.html"] [unique_id "YZ@QCFM9g-bdcxDv4w5IMgAAAAA"]

The correlation rules in general are very difficult to understand because of this.

The inconsistencies go even further than this.

For example, for incoming scoring, the scores are increased as follows:

    setvar:'tx.anomaly_score_pl1=+%{tx.critical_anomaly_score}'"

For outgoing scoring on the other hand, there are always two variables that are increased:

    setvar:'tx.outbound_anomaly_score_pl1=+%{tx.error_anomaly_score}',\
    setvar:'tx.anomaly_score_pl1=+%{tx.error_anomaly_score}'

This not only redundant but allows for further inconsistencies later on. For example, if we compare the blocking rules for incoming and outgoing, as well as for early and regular blocking, we get three different ways to count the scores:

# incoming, early
SecRule TX:PARANOIA_LEVEL "@ge 1" \
    "id:949052,\
    phase:1,\
    pass,\
    t:none,\
    nolog,\
    setvar:'tx.anomaly_score=+%{tx.anomaly_score_pl1}'"

SecRule TX:PARANOIA_LEVEL "@ge 1" \
    "id:949060,\
    phase:2,\
    pass,\
    t:none,\
    nolog,\
    setvar:'tx.anomaly_score=+%{tx.anomaly_score_pl1}'"

# outgoing, early
SecRule TX:PARANOIA_LEVEL "@ge 1" \
    "id:959052,\
    phase:3,\
    pass,\
    t:none,\
    nolog,\
    setvar:'tx.outbound_anomaly_score=+%{tx.anomaly_score_pl1}'"

# outgoing, regular
SecRule TX:PARANOIA_LEVEL "@ge 1" \
    "id:959060,\
    phase:4,\
    pass,\
    t:none,\
    nolog,\
    setvar:'tx.outbound_anomaly_score=+%{tx.outbound_anomaly_score_pl1}'"

To address the above mentioned issues, the semantics and naming of the anomaly scoring variables have to be refactored across the entire rule set. To have consistent semantics and descriptive naming of variables, I would propose the following:

Phase 1 & 2
- executing_inbound_anomaly_score_pl[1234]: Sum up the scores of triggered rules during request processing (indirectly limited by executing_paranoia_level)
- inbound_anomaly_score_pl[1234]: removed because not used
- inbound_anomaly_score: Sum up the values of executing_inbound_anomaly_score_pl[1234] during blocking evaluation according to paranoia_level (blocking decision is based on this)
Phase 3 & 4
- executing_outbound_anomaly_score_pl[1234]: Sum up the scores of triggered rules during response processing (indirectly limited by executing_paranoia_level)
- outbound_anomaly_score_pl[1234]: removed because not used
- outbound_anomaly_score: Sum up the values of executing_outbound_anomaly_score_pl[1234] during blocking evaluation according to paranoia_level (blocking decision is based on this)
Phase 5
- executing_inbound_anomaly_score: Sum up the values of executing_inbound_anomaly_score_pl[1234]
- executing_outbound_anomaly_score: Sum up the values of executing_outbound_anomaly_score_pl[1234]
- executing_anomaly_score_pl[1234]: Sum up the values of executing_inbound_anomaly_score_pl[1234] and executing_outbound_anomaly_score_pl[1234] per paranoia level
- executing_anomaly_score: removed because not used
- anomaly_score_pl[1234]: removed because not used
- anomaly_score: removed because not used

Users that log scoring variables might have to change their log formats as follows (alternatively, additional variables could be created to act as aliases to allow for a grace period):

anomaly_score -> inbound_anomaly_score
anomaly_score_pl[1234] -> executing_inbound_anomaly_score_pl[1234]
outbound_anomaly_score_pl[1234] -> executing_outbound_anomaly_score_pl[1234]

Issue Analytics

State:
Created 2 years ago
Reactions:4
Comments:29 (29 by maintainers)

Top GitHub Comments

2reactions

dune73commented, Feb 6, 2022

Time to come to a conclusion here. @Studersi’s analysis was profound and the proposal useful, but many of us were still not quite happy. (a) because of the general unhappyness with my term “executing”, (b) because it felt we had not understood the problem deep enough and © because the reporting in the 980xxx rules was not covered enough. I fall in category ©.

We talked about this a lot in the meantime and nobody felt fit to refine the existing proposal or to take a decision.

But then I got the chance to trade in the migration of the DoS rule to a plugin against me working on this issue. Thanks @RedXanadu.

So here we go.

The variables

The discussion favored to remove the executing and there was a preference for detection instead. I think it is a good move to lean on the engine and to settle on blocking and detectionOnly.

Given this is going to be a major release, we can clear out all ambiguities and I propose to add the blocking prefix to the paranoia level variable. This gives us

blocking_paranoia_level : The basic PL variable. This is the PL we execute and block.
detectiononly_paranoia_level : This is greater or equal than blocking_paranoia_level and defines the level whose rules are executed, potentiall executed without being used for the blocking decision.

And now for the simplification: Let’s remove all prefixes from the scoring variables. We simply do this as follows:

incoming_anomaly_score_pl1 : Sum of the anomaly score by the rules triggered during execution at PL 1
incoming_anomaly_score_pl2 : Sum of the anomaly score by the rules triggered during execution at PL 2
incoming_anomaly_score_pl3 : Sum of the anomaly score by the rules triggered during execution at PL 3
incoming_anomaly_score_pl4 : Sum of the anomaly score by the rules triggered during execution at PL 4

And then - take note here! - we sum of these scores up to the blocking_paranoia_level and write them into:

incoming_anomaly_score : Anomaly score used for the blocking decision.

Let’s say, we have the following scores per paranoia level: 5-10-15-20 (thus all the rules executed on all PLs):

If blocking_paranoia_level was 4, then the incoming_anomaly_score would be 50.
If blocking_paranoia_level was 3, then the incoming_anomaly_score would be 30.
If blocking_paranoia_level was 2, then the incoming_anomaly_score would be 15.
If blocking_paranoia_level was 1, then the incoming_anomaly_score would be 5.

So if you report the scores on a single line (extended access log format as in my netnea tutorials) you get the following at blocking_paranoia_level=2:

5-10-15-20 15

These numbers will allow you to determine that

blocking_paranoia_level was 2
detectionOnly_paranoia_level was 4

Given blocking and detectionOnly are the same level per default and raising the latter separately is advanced CRS business, I am sure we can guide people to understand the numbers even if 5-10-15-20 15 looks a bit odd at first sight.

The early blocking does not really affect any of this. The scores will simply be lower since only phase 1 is being covered.

And then we do the same for outgoing. Perfectly symmetrical.

As noted in the discussion, the JWall Audit Console makes use of anomaly_score. This console has been out of development for years - Christian Bockermann is a Professor now - but quite a few people and even integrators still use it.

I would hate to carry an unnecessary variable into a new major release because of some legacy software. But I think we can easily propose a rule in the documentation to set this. Or - very professional - provide a jwall-compatibility-plugin that sets the variable in question.

The reporting in the 980xxx rules.

The whole 980xxx business is a mess. And I have been wanting to clean this up for a long time. If we are cleaning up the scoring, then let’s address this too.

Here is what we need to provide:

Some people want to see anomaly scores in the error log even if a request was not blocked
We do separate category scores. We could think about including them in the blocking rules (949110 and friends), but maybe it’s better to leave that aside.
On NGINX it is not possible to report anomaly scores in the access log. So if you want a full overview, you need to write it into the error log. This can also include requests with score 0. Since you may want an easy way to determine how many requests passed the rule set without hitting any rules.

In order to satisfy all these needs, I suggest a variable anomaly_score_reporting with 4 possible values:

0 - No reporting beyond 949110 and friends 1 - Report the anomaly scores per PL and per category in 980xxx for those requests that have been blocked 2 - Report the anomaly scores per PL and per category in 980xxx for those requests that scored 3 - Report the anomaly scores per PL and per category in 980xxx for every requests, also requests with scores 0/0 (inbound/outbound)

Implementing this will be a bit of number juggling, but if you set the value to 0, then you can skip it all and I guess performance won’t be such an issue that way.

1reaction

studersicommented, Feb 22, 2022

I think it is a good suggestion to consistently use inbound/outbound.