overly broad scanners matching
See original GitHub issue_Issue originally created by user ygrek on date 2016-07-14 02:32:01. Link to original issue: https://github.com/SpiderLabs/owasp-modsecurity-crs/issues/406._
Hello,
#258 added AhrefsBot (which is a general purpose crawler, not a vulnerability scanner) to the scanners-user-agents.data
which contradicts the purpose of this list, afaics :
#
# -=[ Vulnerability Scanner Checks ]=-
#
# These rules inspect the default User-Agent and Header values sent by
# various commercial and open source vuln scanners.
#
People are using Ahrefs to explore their own sites to find known problems (like duplicate pages, missing title tags, etc). modsecurity is often installed and configured by the hoster without actual website owner control of its configuration, and due to this rule they lose the ability to use Ahrefs services for their own websites. This is rather unfortunate.
Please remove AhrefsBot from the list of vulnerability scanners.
Issue Analytics
- State:
- Created 3 years ago
- Comments:7
Top GitHub Comments
User ygrek commented on date 2016-07-15 07:58:27:
dune73 I cannot promise that I will do a comprehensive filtering of all scanner user-agents and I don’t know CRS structure well enough, but I see that there was a list of marketing bots (which also included googlebot) which was removed in 2014 - that would be a proper fit if there is an analogue for such list in 3.0.0
csanders-git reputation is an interesting topic, I believe for googlebot you are evaluating the reputation of google as a whole, not the googlebot (which is known to be faked constantly actually). In that case Ahrefs is one of the top tools for online marketing (I think it will be present in every “top 10 SEO tools” collections one can find). Also consider that google is a well-known public company, while many bots are run by smaller companies with niche business which one may not know about until doing a research. The problem with reporting false positives is that users who get hit by this problem may not (1) notice that it happened (2) may not have an ability to understand why it happens (3) even with knowledge may have no permission to fix things - in case of shared hosting where apache is configured by a hosting provider. In best scenario complains will be sent to bot authors, not here. What bothers me is that sysadmins are enabling CRS thinking of it as a protection from security breaches and hacks but instead will be doing arbitrary censorship for their users based on crowdsourced criteria…
All in all and given the above considerations, I understand the best way to resolve this would be for me to create a pull request and continue discussion with more specifics in there.
Thank you.
User ygrek commented on date 2016-07-14 05:28:24:
Said the internets. Seems legit (not really).
I have one more candidate for this list of vuln scanners - googlebot. Why it is not included? And there are actually documented cases when google results are used to search for vulnerabilities, unlike AhrefsBot. What are the criterias to be included in this scanners list?