Using ECS fields in Elasticsearch indices / data streams
See original GitHub issueThe number of ECS fields has grown over the last years and has reached a number which is beyond the 1024 limit for fields in Elasticsearch. This causes issues in several places. In addition, for example in beats we started to import all ECS fields by default which means many fields are defined in the templates and mappings of Beats which are likely never used. This issue is to discuss the problem itself in more detail and the potential solutions to it.
Some of the guidelines for a solution I have in mind:
- Minimal mapping: The resulting mapping of indices / data streams should only contain the fields that are actually used. This keeps the mappings compact and ensures recommendations of fields only happens on the fields that actually exist.
- We need to assume 100+ data streams using ECS mappings: A large number of data streams with these ECS mappings and templates will exist. We need to think of what this means for Elasticsearch, cluster state etc.
- ECS compatible: A user should be able to index any fields as long as these do not conflict with ECS. For example
host
can only be an object and not a keyword.
Potential solutions
To kick things of, in the following I’m bringing up a potential list of solutions but I think all of them are not ideal.
Data streams with ECS enabled
A data stream has a setting where ecs: true
is set. By setting this flag, by default data streams will have the mappings available for the ECS fields. No templates would have to be set but the fields could be extended by templates. How this would work exactly, I don’t know 😃
Use dynamic templates
Instead of specifying the fields directly, dynamic templates are used. This has the benefit that the fields don’t show up in the mappings until these are actually used. This ECS template could exist in Elasticsearch as a component template. I wonder what the effect of this would be if 100 template use this ECS template? @jpountz Does it mean it exist only once in the cluster state or 100 times?
Click to expand dynamic template example for `log.level`
DELETE /test?ignore_unavailable
PUT /test
{
"mappings": {
"dynamic_templates": [
{
"log_level": {
"path_match": "log.level",
"mapping": {
"type": "keyword"
}
}
}
]
}
}
Use dynamic templates to match most of the fields
Most of ECS fields are keyword
. Dynamic templates rules can be used to match the majority of the fields correctly similar to https://github.com/elastic/elasticsearch/blob/feature/apm-integration/x-pack/plugin/core/src/main/resources/data-streams-mappings.json This has the advantage that the mapping stays compact and only enforces mappings for fields which are likely to conflict like host
or error
. It has the downside that not all of ECS is enforced and someone can ingest fields which might conflict with ECS.
Define on ingest time
If I remember correct, Elasticsearch supports defining the type of a field during ingest time. If we are in control of the data creating and data shipper, the role of enforcing and shipping ECS could happen by the shipper. This might work for cases like alerts inside Kibana but does not work for use cases where we don’t control the data.
Heavily use runtime fields in data views
Instead of specifying all the mappings, heavily rely on runtime fields for ECS. Instead of having ECS in the Elasticsearch mapping, have it as a checkbox or similar in data views. Like this users can query on all the ECS fields but we don’t enforce it on ingest time.
Split up ECS in multiple layers
One of the core issues is that ECS keeps growing and very likely, this is not going to stop. In the early days of ECS we discussed having layers of ECS, something like, core, base, extended. There are just very few fields which everyone should be aware of, then there are base fields which are very common and then we have extended with multiple groups / use cases.
Having such a grouping would make it possible to for example only have core or base in our templates instead of ALL fields.
Issue Analytics
- State:
- Created a year ago
- Reactions:5
- Comments:16 (16 by maintainers)
Top GitHub Comments
It would exist 100 times.
At first sight the option I like the most is the first one of somehow packaging ECS with Elasticsearch so that Elasticsearch could optionally make dynamic mapping rules ECS-compliant. We’d naturally have ECS mappings in a single place this way without polluting the cluster state, and indices could always get the latest ECS conventions for newly introduced fields, even if they were created a while ago.
I wonder if it would also make it easier for integrations to not include optional fields, so that they wouldn’t be suggested in Kibana even though these fields are populated in none of the documents, like we saw on the netflow integration.
Haha, you’re asking too much of me, I don’t know. 😃 I opened an issue on the Elasticsearch repository to start gathering feedback: https://github.com/elastic/elasticsearch/issues/85692.