question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

discussion: false positives in vcrt functions

See original GitHub issue

there are a number of interesting rules, like manual PEB parsing, that fire on standard routines inserted by the MSVC compiler. typically, we’d want to include these in the output, except that some of these normal runtime functions aren’t doing anything nefarious (as the rule might suggest, like anti-vm).

this leads to the desire that we’d want to filter out some known functions from matching.

there are at least two obvious approaches:

  1. using existing capa logic/rules to match known functions (like count of bb, count and/or distribution of mnemonics, etc) and then not the matches
  2. rely on the analysis backend to provide metadata about functions, such as auto-detected function name, and let rules match against this

both of these have tradeoffs, and its not clear what we should do.

if we use capa infrastructure to match functions,

pro:

  • need no new features or syntax, can do it today
  • works across all analysis backends
  • easy to inspect

con:

  • we have to maintain function signatures (not our goal here)
  • our signatures may not be as good as purpose built tech, like FLIRT of Ghidra’s database
  • matching N signatures against M functions may introduce performance issues (maybe, this is a guess)

if we rely on backend analysis backends to match functions,

pro:

  • rely on backend expertise to do function id very well
  • less maintainence

con:

  • need new syntax, maybe like function/name: __init_iob
  • different analysis backends have different quality, i.e. IDA is very good, and vivisect has minimal coverage
  • different analysis backends may use different names/formats for function names that we have to normalize

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
williballenthincommented, Apr 30, 2021

this is closed in #446

1reaction
re-foxcommented, Dec 1, 2020

There are certainly pros and cons for function ID. Looking forward to a future where Capa has multiple backends, I’d have to guess that each disassembly library has varying levels of function ID. Something like pure capstone has absolutely no idea, whereas IDA has a huge backing library of FLIRT. Capa, however, is equipped to carry the weight of function ID, where the disassembler may be ignorant.

I would have no expectation for Capa/Feye/Community to maintain a large library of signatures, but for more common functions, it could be useful. Exhaustive support is where it becomes cumbersome.

One solution is to build support for known functions. Where results are not rendered in the report if they match on a known function. One downside is if the known function rule triggers on a false positive, you could introduce a false negative which is not ideal.

Rendering the results and give them the function/ prefix/namespace like what @williballenthin was mentioning is a similar option.

Another idea could be to divide the rules into different categories. (this could be managed in the rule meta)

  • Categorization (for signatures against certain malware families, highly specific)
  • Techniques (a majority of existing rules would sit in here)
  • Informational (metadata, language, compiler, etc…)
  • Library (known library functions - could be filtered out in output) This is a hack of the existing namespacing that Capa already supports. Introducing malware categorization could be out of scope though. Yara may be better suited for that.

With that said, I do think that would be some utility out of having support for the “most common” (that could be up for debate) functions. Something like __security_init_cookie comes to mind. I could be wrong, but I can’t imagine that those are updated on an aggressive cadence.

I’m of the mindset that rules and scanning are “cheap” and the more context the better.

Read more comments on GitHub >

github_iconTop Results From Across the Web

False Positive Results With SARS-CoV-2 RT-PCR Tests and ...
Based upon our own experience in investigating groups of false-positive RT-PCR results and discussions with laboratory directors, ...
Read more >
False Positives and False Negatives - Math is Fun
(A "positive" result means there IS a defect.) Antivirus software: a "false positive" is when a normal file is thought to be a...
Read more >
Practices of Science: False Positives and False Negatives
A false positive is when a scientist determines something is true when it is actually false (also called a type I error). A...
Read more >
Assessment of False Positives with the Humphrey Field ... - IOVS
purpose. To evaluate the effects of false-positive (FP) response errors on mean deviation (MD), pattern standard deviation (PSD), glaucoma hemifield test ...
Read more >
What Is A False Positive? Overview + Examples | Perforce
What is a false positive? And, what is a false negative? Both false positives and false negatives are common in static analysis.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found