question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[CEP] Use native Elasticsearch reindexing for index changes

See original GitHub issue

Abstract Incorporate https://www.elastic.co/guide/en/elasticsearch/reference/2.4/docs-reindex.html into our automatic elasticsearch reindexing setup.

Motivation It’s supposedly much faster than resyncing all the docs ourselves

Specification There should likely be a fallback method for when we need to reindex data in place because of an issue with the pillows, as opposed to reindexing because we changed the mapping, which is the more common case.

Impact on users This should not affect users at all.

Impact on hosting This change should be transparent to local hosting setups. If done before the EOL of our ES 1 backend option, it should fall back to current behavior if the setting ELASTICSEARCH_MAJOR_VERSION = 1 is used.

Backwards compatibility Besides backwards compatibility with ELASTICSEARCH_MAJOR_VERSION = 1 described above, this should be an in place replacement of our current system with no major affects on users or devops, other than reindexes being faster.

Release Timeline There is no hard date by which we must do this, but we’d probably want to do it before the next time we reindex forms or cases, as in https://github.com/dimagi/commcare-hq/pull/25666.

Open questions and issues I’m not sure we fully understand the behavior of the native elasticsearch reindex functionality. There’s always the tricky issue of how to make sure we don’t skip any items that have come in between when we start the reindex and when we flip all new reads and writes to it; it’s possible that our current code already handles this correctly and in a way that cleanly applies to the proposed reindex implementation.

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
snopokecommented, Jan 29, 2020

I’m not sure we fully understand the behavior of the native elasticsearch reindex functionality. There’s always the tricky issue of how to make sure we don’t skip any items that have come in between when we start the reindex and when we flip all new reads and writes to it; it’s possible that our current code already handles this correctly and in a way that cleanly applies to the proposed reindex implementation.

I haven’t thought about this much but reading the docs I see there are options for updating or overwriting or ignoring documents that already exist in the target index. One option would be to start the pillow writing to both old and new indexes before the reindex starts and configure the ES reindex to ignore existing docs.

Just looking at our current reindex workflow I think the part that sets the pillow checkpoints is broken because either it does not set the checkpoint at all (e.g. sql form reindexer) or it uses the old pillows (e.g. user reindexer).

0reactions
snopokecommented, Apr 29, 2020

@sravfeyn can you update this with the current state of the reindex tools you used.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Reindex API | Elasticsearch Guide [8.5]
Copies documents from a source to a destination. The source can be any existing index, alias, or data stream. The destination must differ...
Read more >
Reindex API : improve robustness in case of error #22471
Hello, I did a migration from elasticsearch 2.3.4 to 5.1.1 by following the ... [CEP] Use native Elasticsearch reindexing for index changes ......
Read more >
6-steps-to-reindex-elasticsearch-data
Reindexing eliminates the original index and creates a new index in the process of new mapping and some downtime. For a business, this...
Read more >
How to use Elastic Search Reindex API from Kibana Tutorial
Getting started with Elastic Search and Kibana. How to use Elastic Search Reindex API from Kibana Tutorial | Change Mapping of Index.
Read more >
How I Reindex Millions Elasticsearch Documents Using ...
Disable ES index refresh during heavy reindexing: but don't forget to enable it back (defaults to 1s). PUT /index_name/_settings{ “index” : { “ ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found