Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

discussion: capa JSON format

See original GitHub issue

I have the following questions/comments after changing the IDA plugin to use the new JSON format:

Does it make sense to define (if not done already) a JSON schema for the new format?
- Pros: Schema would allow for easy validation of the format and serve as documentation for developers wanting to ingest the data into their systems
- Cons: Time and effort
Does it make sense to include the original rule content for match? This data can be found in the source field of the parent match but finding the original source this way isn’t as convenient
- Pros: Convenience when parsing/displaying rule data for match
- Cons: Duplicate data in output
Does it make sense to include the locations for range? There locations, and corresponding context e.g. the instruction at a location, used to be displayed in the IDA plugin.
- Pros: Locations can be rendered providing additional context
- Cons: More data in output
Does it make sense to include additional meta data e.g. hash value, entry point, etc. specific to the binary file from which the output was produced?
- Pros: Systems looking to ingest the data could render the additional context - meta data could be used to map output back to original binary
- Cons: More data in output and more work on extractor end to get the meta data
Does it make sense to include feature comments e.g. PAGE_EXECUTE_READWRITE from number: 0x40 = PAGE_EXECUTE_READWRITE
- Pros: Additional context/comments can be rendered
- Cons: More data in output

Issue Analytics

State:
Created 3 years ago
Comments:8

Top GitHub Comments

3reactions

mr-tzcommented, Jun 30, 2020

Nice suggestions. I agree with Willi’s thoughts. For meta data, additional fields could include:

base address
used extractor (vivisect, IDA for now)
file format (pe, sc, etc.)
possibly other command line options (rules path, tag, etc.)

3reactions

williballenthincommented, Jun 30, 2020

Does it make sense to include feature comments e.g. PAGE_EXECUTE_READWRITE from number: 0x40 = PAGE_EXECUTE_READWRITE

yes, @Ana06 is working on this in #39 . feature instances will have an optional field description that will contain this information.

Top Results From Across the Web

parse MBC technique into fields for the JSON output format #526

parsing the string for the json output is a good idea - it reduces the number of times that programmatic users of capa...

Discuss JSON - Tutorialspoint

Discuss JSON, JSON or JavaScript Object Notation is a lightweight text-based open standard designed for human-readable data interchange. The JSON format was ...

Overview of the CoverageJSON format - W3C

Abstract. This Note describes CoverageJSON, a data format for describing "coverage" data in JavaScript Object Notation (JSON), and provides ...

HandlrClient — HandlrClient • handlr - Docs - rOpenSci

DOI: we request citeproc-json format from the Crossref API ... Revision of current nominal species diagnostic features are performed and discussed.

What is JSON? The universal data format - InfoWorld

JSON is the leading data interchange format for web applications and more. Here's what you need to know about JavaScript Object Notation.