Unclear docs
See original GitHub issuePartial copy of this thread: https://discuss.elastic.co/t/writing-to-multiple-indices-and-documentation-about-it/82859
There are some unclear things in the documentation
At some point it says:
Note that multiple indices and/or types are allowed only for reading
But the next paragraph is titled “Dynamic/multi resource writes” and states that
For writing, elasticsearch-hadoop allows the target resource to be resolved at runtime by using patterns (by using the {<field-name>} format), resolved at runtime based on the data being streamed to Elasticsearch. That is, one can save documents to a certain index or type based on one or multiple fields resolved from the document about to be saved.
Is seems like these two statements contradict each other. I’m still not sure is it possible to write to multiple indices or not.
Secondly, the example with timestamp also looks unclear. I think about resource as ‘index/type’ (please fix me if I’m wrong). In the example
# index the documents based on their date
es.resource.write = my-collection/{@timestamp:YYYY.MM.dd}
timestamp is a type right? But usually we separate indices by time, not data types. This seems wrong. I would expect timestamp to be a part of index:
# index the documents based on their date
es.resource.write = my-collection.{@timestamp:YYYY.MM.dd}/{media_type}
Is it valid resource? Will it work as expected (i.e. write to multiple indices)?
P.S. If I misplaced the issue, please specify a proper place to report it. I had zero replies at forum for almost a month. Elasticsearch/docs says: “If you find an error in the documentation, you should open an issue or pull request on the repository which contains the docs”
Issue Analytics
- State:
- Created 6 years ago
- Comments:10 (5 by maintainers)
Top GitHub Comments
That’s fine with me to open an enhancement ticket for that. Thanks!
This could certainly be cleared up a bit more:
This corresponds more toward using index and type names like
_all/foo
, where multiple indices are being read by usage of a pattern sent to Elasticsearch.In this case, it explains that you can use a special pattern (denoted by curly braces) to have the connector determine which index and type to save documents to at runtime using the values stored in the documents’ fields. This is different from the above because we are resolving the resource to a single target resource at write time for each document. If you were to use this pattern with something that does not resolve to a single index (
_all/{field}
for instance will have a single type resolved, but the_all
index does not correspond to a single index) then the writing operation will not be successful.~ versus ~
This is mostly to highlight that you can use the
@timestamp
field in a pattern, and format it however you like. These patterns can exist in either the index path element or the type path element, it makes no difference. Multiple patterns can be used as well, they will be resolved at runtime, as long as in resolving them with data from the document they point to a single index afterward.Does that clear things up? I’ll look into expanding the documentation around this to clarify the differences in “multiple indices” for each situation.