question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Using ECS fields in Elasticsearch indices / data streams

See original GitHub issue

The number of ECS fields has grown over the last years and has reached a number which is beyond the 1024 limit for fields in Elasticsearch. This causes issues in several places. In addition, for example in beats we started to import all ECS fields by default which means many fields are defined in the templates and mappings of Beats which are likely never used. This issue is to discuss the problem itself in more detail and the potential solutions to it.

Some of the guidelines for a solution I have in mind:

  • Minimal mapping: The resulting mapping of indices / data streams should only contain the fields that are actually used. This keeps the mappings compact and ensures recommendations of fields only happens on the fields that actually exist.
  • We need to assume 100+ data streams using ECS mappings: A large number of data streams with these ECS mappings and templates will exist. We need to think of what this means for Elasticsearch, cluster state etc.
  • ECS compatible: A user should be able to index any fields as long as these do not conflict with ECS. For example host can only be an object and not a keyword.

Potential solutions

To kick things of, in the following I’m bringing up a potential list of solutions but I think all of them are not ideal.

Data streams with ECS enabled

A data stream has a setting where ecs: true is set. By setting this flag, by default data streams will have the mappings available for the ECS fields. No templates would have to be set but the fields could be extended by templates. How this would work exactly, I don’t know 😃

Use dynamic templates

Instead of specifying the fields directly, dynamic templates are used. This has the benefit that the fields don’t show up in the mappings until these are actually used. This ECS template could exist in Elasticsearch as a component template. I wonder what the effect of this would be if 100 template use this ECS template? @jpountz Does it mean it exist only once in the cluster state or 100 times?

Click to expand dynamic template example for `log.level`
DELETE /test?ignore_unavailable
PUT /test
{
  "mappings": {
    "dynamic_templates": [
      {
        "log_level": {
          "path_match": "log.level",
          "mapping": {
            "type": "keyword"
          }
        }
      }
    ]
  }
}

Use dynamic templates to match most of the fields

Most of ECS fields are keyword. Dynamic templates rules can be used to match the majority of the fields correctly similar to https://github.com/elastic/elasticsearch/blob/feature/apm-integration/x-pack/plugin/core/src/main/resources/data-streams-mappings.json This has the advantage that the mapping stays compact and only enforces mappings for fields which are likely to conflict like host or error. It has the downside that not all of ECS is enforced and someone can ingest fields which might conflict with ECS.

Define on ingest time

If I remember correct, Elasticsearch supports defining the type of a field during ingest time. If we are in control of the data creating and data shipper, the role of enforcing and shipping ECS could happen by the shipper. This might work for cases like alerts inside Kibana but does not work for use cases where we don’t control the data.

Heavily use runtime fields in data views

Instead of specifying all the mappings, heavily rely on runtime fields for ECS. Instead of having ECS in the Elasticsearch mapping, have it as a checkbox or similar in data views. Like this users can query on all the ECS fields but we don’t enforce it on ingest time.

Split up ECS in multiple layers

One of the core issues is that ECS keeps growing and very likely, this is not going to stop. In the early days of ECS we discussed having layers of ECS, something like, core, base, extended. There are just very few fields which everyone should be aware of, then there are base fields which are very common and then we have extended with multiple groups / use cases.

Having such a grouping would make it possible to for example only have core or base in our templates instead of ALL fields.

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:5
  • Comments:16 (16 by maintainers)

github_iconTop GitHub Comments

4reactions
jpountzcommented, Apr 5, 2022

Does it mean it exist only once in the cluster state or 100 times?

It would exist 100 times.

At first sight the option I like the most is the first one of somehow packaging ECS with Elasticsearch so that Elasticsearch could optionally make dynamic mapping rules ECS-compliant. We’d naturally have ECS mappings in a single place this way without polluting the cluster state, and indices could always get the latest ECS conventions for newly introduced fields, even if they were created a while ago.

I wonder if it would also make it easier for integrations to not include optional fields, so that they wouldn’t be suggested in Kibana even though these fields are populated in none of the documents, like we saw on the netflow integration.

2reactions
jpountzcommented, Apr 5, 2022

Haha, you’re asking too much of me, I don’t know. 😃 I opened an issue on the Elasticsearch repository to start gathering feedback: https://github.com/elastic/elasticsearch/issues/85692.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Data Stream Fields | Elastic Common Schema (ECS ...
An Elasticsearch data stream consists of one or more backing indices, and a data stream name forms ... Many users will populate this...
Read more >
ecs/0009-data_stream-fields.md at main · elastic/ecs - GitHub
In the new indexing strategy, the value of the data stream fields combine to the name of the actual data stream in the...
Read more >
How to integrate custom logs with Elastic Agent - Zenika
Two requirements to create a data stream : a matching index template should exist; the documents stored in the data stream must contain...
Read more >
Elastic Common Schema (ECS) Reference
ECS specifies field names and Elasticsearch datatypes for each field, and provides descriptions and example usage. ECS also groups fields into ECS levels,...
Read more >
Elasticsearch Data Stream - How to Set Up & Use Effectively
You can update dynamic settings or add new fields to a data stream similar to the way you do a regular index. If...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found