Debating use-cases/scenarios and performance implications of `threat.enrichments[].indicator` vs `threat.indicator`* n
See original GitHub issueThis is a verbatim copy of an external conversation, migrated here for transparency
@rylnd writes: Hey @djptek, I was hoping we could continue the conversation from RFC Threat Integration Stage 3 - merged here:
my concern is that the indicators (in my use case) are really separate events
I’m struggling to wrap my head around this, some examples might be helpful. and encapsulating them all in a single event using an array:
- reduces potential for aggregations across similar events
- prevents aggregation across events e.g. where you might wish to compare
threat.enrichments[].indicator.type
againstthreat.indicator.type
I don’t disagree that the nested threat.enrichments[]
makes aggregations more difficult, but without specific examples of what you’re trying to do it’s hard to say what the correct approach/structure should be.
In my mind, an event with threat.enrichments
means: “this event matched multiple indicators; the details of those indicators, and how they were matched are as follows.” An event with threat.indicator
is simply an indicator. An enriched event references one or more indicators, so aggregating across both types of documents implies to me that you’re trying to do some additional enrichment/joining?
Regarding your question:
Did you consider the possibility of denormalising multiple indicators into separate events (with duplicate parent metadata) as an alternative to adding the threat.enrichments[]array?
If I’m understanding correctly, you’re proposing that an event matching two indicators would actually be two events, one for each indicator? Since our documents represent userland events (and not e.g. a “matching” itself), creating one per indicator seems like an unnecessary duplication and a departure from the reality of the system (at least as I am envisioning it).
Since an event matches N indicators, threat.enrichments[]
represents that relationship. We had at one point discussed collecting a separate index of “matches”, but the burden of having to “join” to that index whenever alerts were retrieved was dropped in favor of the nested document structure. I think we still have opportunity to pursue this implementation, though, if that may help.
@djptek writes:
creating one per indicator seems like an unnecessary duplication
Denormalising data prior to storage in Elasticsearch is best practice in the majority of cases where you might want to run an aggregation. There is a lot of compression going on to mitigate the impact of duplication and any additional storage cost ought to be more than offset by faster aggregations
examples
for example, if I wanted to aggregate on threat.indicator.type
or threat.indicator.ip
and some of my events used this field while others used threat.enrichments.indicator.type
or threat.enrichments.indicator.ip
those fields aren’t directly comparable.
I’m aware that related.ip
exists and I’ve written some painless to copy the values there, however, that I do see as unnecessary duplication, rather than denormalisation, since by using a new field I’m effectively bypassing all the magic of columnar keyword compression that’s going on in Elasticsearch
@rylnd writes:
Understood about the denormalization best practice. However, from the security solution perspective, having an alert with two threat.enrichments[]
is NOT equivalent to having two alerts, each with single indicators.
An alert represents a rule detecting something in the source data that merits investigation. Rules do not generate duplicate alerts, so per-indicator alerts would break this paradigm and many workflows.
for example, if I wanted to aggregate on
threat.indicator.type
orthreat.indicator.ip
and some of my events used this field while others usedthreat.enrichments.indicator.type
orthreat.enrichments.indicator.ip
those fields aren’t directly comparable.
I’m still unclear on what this aggregation would represent, since these are two types of documents; threat.indicator
documents represent indicators, while documents with threat.enrichments
represent any event that’s been enriched with indicators.
@djptek writes: Thanks Ryland
However, from the security solution perspective, having an alert with two threat.enrichments[] is NOT equivalent to having two alerts, each with single indicators.
If the data for a unique alert with two threat.enrichments[]
were to be denormalized, this doesn’t equate to two alerts. There would be two Elasticsearch documents, each with a unique ID and each sharing a common Alert ID. So there is still only one alert.
In isolation, each document represents a unique indicator.
Aggregated on the basis of their common Alert ID, the set of documents represents the alert + indicators. My goal in suggesting this is to ensure that the system delivers best-of-class performance both at ingest and query/aggregation time.
Is it OK with you if I copy this Slack conversation wholesale into a new GitHub issue referencing the original? :elasticheart:
Is it OK with you if I copy this Slack conversation wholesale into a new GitHub issue referencing the original? Of course, that sounds fantastic!
Issue Analytics
- State:
- Created a year ago
- Comments:10 (6 by maintainers)
Top GitHub Comments
Thanks for sharing more context! I’m probably not the best person to reason about if it’s a good model or not as I haven’t worked on the alerts and how they are enriched with indicators. I only have context about the indicators themselves. In the Security Solution “alerts enriched with indicators” and indicators themselves are two separate entities. I will still share my thoughts hoping they are relevant to the discussion. First of all, if we follow the current reality of the Security Solution your example would look more like
meaning that there are two indicators ingested from some source and then alerts created by an indicator match rule enriched with these indicators if a match is found in the source events. One thing for sure is that this data model I guess will require a complete redo of all things related to Alerts. Then the question is what is the goal of this particular discussion? If it is to find a solution to your problem at hand with 3rd party integration, maybe it is a good idea to model things the same way they are modeled currently in the Security Solution. Meaning if 3rd party has alerts enriched with indicators, create indicators in addition to alerts enriched with these indicators. Then we arrive at the concerns you have regarding the dashboards and aggregations and here is where I would really like to learn what dashboards and aggregations you are building as they might be very relevant to our team’s scope. As mentioned we are working on the Threat Intelligence capabilities around Indicators of Compromise and want to learn more about everything related to it and your use case seems to be very new to us (building dashboards around indicators and alerts enriched with indicators) If the goal of the discussion is to propose and discuss the future state - then I think @rylnd is the right person to ask for feedback on the proposed model as he has much more context about alerts and enrichments
@djptek happy to chat about your use case a bit more, if having a call works for you, feel free to schedule smth !