[CEP] Use native Elasticsearch reindexing for index changes
See original GitHub issueAbstract Incorporate https://www.elastic.co/guide/en/elasticsearch/reference/2.4/docs-reindex.html into our automatic elasticsearch reindexing setup.
Motivation It’s supposedly much faster than resyncing all the docs ourselves
Specification There should likely be a fallback method for when we need to reindex data in place because of an issue with the pillows, as opposed to reindexing because we changed the mapping, which is the more common case.
Impact on users This should not affect users at all.
Impact on hosting
This change should be transparent to local hosting setups. If done before the EOL of our ES 1 backend option, it should fall back to current behavior if the setting ELASTICSEARCH_MAJOR_VERSION = 1
is used.
Backwards compatibility
Besides backwards compatibility with ELASTICSEARCH_MAJOR_VERSION = 1
described above, this should be an in place replacement of our current system with no major affects on users or devops, other than reindexes being faster.
Release Timeline There is no hard date by which we must do this, but we’d probably want to do it before the next time we reindex forms or cases, as in https://github.com/dimagi/commcare-hq/pull/25666.
Open questions and issues I’m not sure we fully understand the behavior of the native elasticsearch reindex functionality. There’s always the tricky issue of how to make sure we don’t skip any items that have come in between when we start the reindex and when we flip all new reads and writes to it; it’s possible that our current code already handles this correctly and in a way that cleanly applies to the proposed reindex implementation.
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (6 by maintainers)
Top GitHub Comments
I haven’t thought about this much but reading the docs I see there are options for updating or overwriting or ignoring documents that already exist in the target index. One option would be to start the pillow writing to both old and new indexes before the reindex starts and configure the ES reindex to ignore existing docs.
Just looking at our current reindex workflow I think the part that sets the pillow checkpoints is broken because either it does not set the checkpoint at all (e.g. sql form reindexer) or it uses the old pillows (e.g. user reindexer).
@sravfeyn can you update this with the current state of the reindex tools you used.