discussion: false positives in vcrt functions
See original GitHub issuethere are a number of interesting rules, like manual PEB parsing, that fire on standard routines inserted by the MSVC compiler. typically, we’d want to include these in the output, except that some of these normal runtime functions aren’t doing anything nefarious (as the rule might suggest, like anti-vm).
this leads to the desire that we’d want to filter out some known functions from matching.
there are at least two obvious approaches:
- using existing capa logic/rules to match known functions (like count of bb, count and/or distribution of mnemonics, etc) and then
not
the matches - rely on the analysis backend to provide metadata about functions, such as auto-detected function name, and let rules match against this
both of these have tradeoffs, and its not clear what we should do.
if we use capa infrastructure to match functions,
pro:
- need no new features or syntax, can do it today
- works across all analysis backends
- easy to inspect
con:
- we have to maintain function signatures (not our goal here)
- our signatures may not be as good as purpose built tech, like FLIRT of Ghidra’s database
- matching N signatures against M functions may introduce performance issues (maybe, this is a guess)
if we rely on backend analysis backends to match functions,
pro:
- rely on backend expertise to do function id very well
- less maintainence
con:
- need new syntax, maybe like
function/name: __init_iob
- different analysis backends have different quality, i.e. IDA is very good, and vivisect has minimal coverage
- different analysis backends may use different names/formats for function names that we have to normalize
Issue Analytics
- State:
- Created 3 years ago
- Comments:5
Top Results From Across the Web
False Positive Results With SARS-CoV-2 RT-PCR Tests and ...
Based upon our own experience in investigating groups of false-positive RT-PCR results and discussions with laboratory directors, ...
Read more >False Positives and False Negatives - Math is Fun
(A "positive" result means there IS a defect.) Antivirus software: a "false positive" is when a normal file is thought to be a...
Read more >Practices of Science: False Positives and False Negatives
A false positive is when a scientist determines something is true when it is actually false (also called a type I error). A...
Read more >Assessment of False Positives with the Humphrey Field ... - IOVS
purpose. To evaluate the effects of false-positive (FP) response errors on mean deviation (MD), pattern standard deviation (PSD), glaucoma hemifield test ...
Read more >What Is A False Positive? Overview + Examples | Perforce
What is a false positive? And, what is a false negative? Both false positives and false negatives are common in static analysis.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
this is closed in #446
There are certainly pros and cons for function ID. Looking forward to a future where Capa has multiple backends, I’d have to guess that each disassembly library has varying levels of function ID. Something like pure capstone has absolutely no idea, whereas IDA has a huge backing library of FLIRT. Capa, however, is equipped to carry the weight of function ID, where the disassembler may be ignorant.
I would have no expectation for Capa/Feye/Community to maintain a large library of signatures, but for more common functions, it could be useful. Exhaustive support is where it becomes cumbersome.
One solution is to build support for known functions. Where results are not rendered in the report if they match on a known function. One downside is if the known function rule triggers on a false positive, you could introduce a false negative which is not ideal.
Rendering the results and give them the
function/
prefix/namespace like what @williballenthin was mentioning is a similar option.Another idea could be to divide the rules into different categories. (this could be managed in the rule meta)
Categorization
(for signatures against certain malware families, highly specific)Techniques
(a majority of existing rules would sit in here)Informational
(metadata, language, compiler, etc…)Library
(known library functions - could be filtered out in output) This is a hack of the existing namespacing that Capa already supports. Introducing malware categorization could be out of scope though. Yara may be better suited for that.With that said, I do think that would be some utility out of having support for the “most common” (that could be up for debate) functions. Something like
__security_init_cookie
comes to mind. I could be wrong, but I can’t imagine that those are updated on an aggressive cadence.I’m of the mindset that rules and scanning are “cheap” and the more context the better.