discussion: capa JSON format
See original GitHub issueI have the following questions/comments after changing the IDA plugin to use the new JSON format:
-
Does it make sense to define (if not done already) a JSON schema for the new format?
- Pros: Schema would allow for easy validation of the format and serve as documentation for developers wanting to ingest the data into their systems
- Cons: Time and effort
-
Does it make sense to include the original rule content for
match
? This data can be found in thesource
field of the parentmatch
but finding the original source this way isn’t as convenient- Pros: Convenience when parsing/displaying rule data for
match
- Cons: Duplicate data in output
- Pros: Convenience when parsing/displaying rule data for
-
Does it make sense to include the locations for
range
? There locations, and corresponding context e.g. the instruction at a location, used to be displayed in the IDA plugin.- Pros: Locations can be rendered providing additional context
- Cons: More data in output
-
Does it make sense to include additional meta data e.g. hash value, entry point, etc. specific to the binary file from which the output was produced?
- Pros: Systems looking to ingest the data could render the additional context - meta data could be used to map output back to original binary
- Cons: More data in output and more work on extractor end to get the meta data
-
Does it make sense to include feature comments e.g.
PAGE_EXECUTE_READWRITE
fromnumber: 0x40 = PAGE_EXECUTE_READWRITE
- Pros: Additional context/comments can be rendered
- Cons: More data in output
Issue Analytics
- State:
- Created 3 years ago
- Comments:8
Top GitHub Comments
Nice suggestions. I agree with Willi’s thoughts. For meta data, additional fields could include:
yes, @Ana06 is working on this in #39 .
feature
instances will have an optional fielddescription
that will contain this information.