question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Debating use-cases/scenarios and performance implications of `threat.enrichments[].indicator` vs `threat.indicator`* n

See original GitHub issue

This is a verbatim copy of an external conversation, migrated here for transparency

@rylnd writes: Hey @djptek, I was hoping we could continue the conversation from RFC Threat Integration Stage 3 - merged here:

my concern is that the indicators (in my use case) are really separate events

I’m struggling to wrap my head around this, some examples might be helpful. and encapsulating them all in a single event using an array:

  • reduces potential for aggregations across similar events
  • prevents aggregation across events e.g. where you might wish to compare threat.enrichments[].indicator.type against threat.indicator.type

I don’t disagree that the nested threat.enrichments[] makes aggregations more difficult, but without specific examples of what you’re trying to do it’s hard to say what the correct approach/structure should be.

In my mind, an event with threat.enrichments means: “this event matched multiple indicators; the details of those indicators, and how they were matched are as follows.” An event with threat.indicator is simply an indicator. An enriched event references one or more indicators, so aggregating across both types of documents implies to me that you’re trying to do some additional enrichment/joining?

Regarding your question:

Did you consider the possibility of denormalising multiple indicators into separate events (with duplicate parent metadata) as an alternative to adding the threat.enrichments[]array?

If I’m understanding correctly, you’re proposing that an event matching two indicators would actually be two events, one for each indicator? Since our documents represent userland events (and not e.g. a “matching” itself), creating one per indicator seems like an unnecessary duplication and a departure from the reality of the system (at least as I am envisioning it).

Since an event matches N indicators, threat.enrichments[] represents that relationship. We had at one point discussed collecting a separate index of “matches”, but the burden of having to “join” to that index whenever alerts were retrieved was dropped in favor of the nested document structure. I think we still have opportunity to pursue this implementation, though, if that may help.

@djptek writes:

creating one per indicator seems like an unnecessary duplication

Denormalising data prior to storage in Elasticsearch is best practice in the majority of cases where you might want to run an aggregation. There is a lot of compression going on to mitigate the impact of duplication and any additional storage cost ought to be more than offset by faster aggregations

examples

for example, if I wanted to aggregate on threat.indicator.type or threat.indicator.ip and some of my events used this field while others used threat.enrichments.indicator.type or threat.enrichments.indicator.ip those fields aren’t directly comparable.

I’m aware that related.ip exists and I’ve written some painless to copy the values there, however, that I do see as unnecessary duplication, rather than denormalisation, since by using a new field I’m effectively bypassing all the magic of columnar keyword compression that’s going on in Elasticsearch

@rylnd writes: Understood about the denormalization best practice. However, from the security solution perspective, having an alert with two threat.enrichments[] is NOT equivalent to having two alerts, each with single indicators.

An alert represents a rule detecting something in the source data that merits investigation. Rules do not generate duplicate alerts, so per-indicator alerts would break this paradigm and many workflows.

for example, if I wanted to aggregate on threat.indicator.type or threat.indicator.ip and some of my events used this field while others used threat.enrichments.indicator.type or threat.enrichments.indicator.ip those fields aren’t directly comparable.

I’m still unclear on what this aggregation would represent, since these are two types of documents; threat.indicator documents represent indicators, while documents with threat.enrichments represent any event that’s been enriched with indicators.

@djptek writes: Thanks Ryland

However, from the security solution perspective, having an alert with two threat.enrichments[] is NOT equivalent to having two alerts, each with single indicators.

If the data for a unique alert with two threat.enrichments[] were to be denormalized, this doesn’t equate to two alerts. There would be two Elasticsearch documents, each with a unique ID and each sharing a common Alert ID. So there is still only one alert.

In isolation, each document represents a unique indicator.

Aggregated on the basis of their common Alert ID, the set of documents represents the alert + indicators. My goal in suggesting this is to ensure that the system delivers best-of-class performance both at ingest and query/aggregation time.

Is it OK with you if I copy this Slack conversation wholesale into a new GitHub issue referencing the original? :elasticheart:

@rylnd

Is it OK with you if I copy this Slack conversation wholesale into a new GitHub issue referencing the original? Of course, that sounds fantastic!

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:10 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
maxcoldcommented, Sep 7, 2022

Thanks for sharing more context! I’m probably not the best person to reason about if it’s a good model or not as I haven’t worked on the alerts and how they are enriched with indicators. I only have context about the indicators themselves. In the Security Solution “alerts enriched with indicators” and indicators themselves are two separate entities. I will still share my thoughts hoping they are relevant to the discussion. First of all, if we follow the current reality of the Security Solution your example would look more like

{"event":{category: 'threat', type: 'indicator'},"threat":{"indicator":{"ip":"1.128.0.0"}}}
{"event":{category: 'threat', type: 'indicator'},"threat":{"indicator":{"ip":"1.128.0.1"}}}
{"event":{"id":"1", <category, type, etc. of alert>},"threat":{"indicator":{"ip":"1.128.0.0"}}}
{"event":{"id":"1", <category, type, etc.  of alert>},"threat":{"indicator":{"ip":"1.128.0.1"}}}
{"event":{"id":"2", <category, type, etc.  of alert>},"threat":{"indicator":{"ip":"1.128.0.0"}}}
{"event":{"id":"3", <category, type, etc. of alert>},"threat":{"indicator":{"ip":"1.128.0.0"}}}
{"event":{"id":"3", <category, type, etc. of alert>},"threat":{"indicator":{"ip":"1.128.0.1"}}}

meaning that there are two indicators ingested from some source and then alerts created by an indicator match rule enriched with these indicators if a match is found in the source events. One thing for sure is that this data model I guess will require a complete redo of all things related to Alerts. Then the question is what is the goal of this particular discussion? If it is to find a solution to your problem at hand with 3rd party integration, maybe it is a good idea to model things the same way they are modeled currently in the Security Solution. Meaning if 3rd party has alerts enriched with indicators, create indicators in addition to alerts enriched with these indicators. Then we arrive at the concerns you have regarding the dashboards and aggregations and here is where I would really like to learn what dashboards and aggregations you are building as they might be very relevant to our team’s scope. As mentioned we are working on the Threat Intelligence capabilities around Indicators of Compromise and want to learn more about everything related to it and your use case seems to be very new to us (building dashboards around indicators and alerts enriched with indicators) If the goal of the discussion is to propose and discuss the future state - then I think @rylnd is the right person to ask for feedback on the proposed model as he has much more context about alerts and enrichments

0reactions
maxcoldcommented, Sep 15, 2022

@djptek happy to chat about your use case a bit more, if having a call works for you, feel free to schedule smth !

Read more comments on GitHub >

github_iconTop Results From Across the Web

Issues · elastic/ecs - GitHub
Debating use-cases/scenarios and performance implications of threat.enrichments[].indicator vs threat.indicator * n enhancement New feature or request.
Read more >
Threat Fields | Elastic Common Schema (ECS) Reference [8.5]
Threat Field Detailsedit ; threat.enrichments.indicator.confidence. Identifies the vendor-neutral confidence rating using the None/Low/Medium/High scale defined ...
Read more >
Enriching Elastic Security Events and Alerts with Threat ...
In this session, you'll learn about the role threat intel enrichments play within the analyst workflow, including a technical exploration of ...
Read more >
Four Types of Threat Detection - Dragos
Threat detection plays an outsized role in cybersecurity as arguably the most ... most indicators originate from existing investigations or when performing ......
Read more >
Threat Detection: IOC vs. IOA - RocketCyber
In the cyber security industry, indicator artifact examples include static pieces of evidence, such as: Process, File Name, Hashes, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found