question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Reindex helper does not respect size limit

See original GitHub issue

Not sure if this is possible, but I haven’t found a way to apply a maximum number of documents to reindex using the reindex helper. Everything I’ve tried just reindexes everything.

Using a size limit with search works as expected, but reindex doesn’t seem to respect this. I can use curl with the same query for reindexing and it works as expected. Is there something I’m missing by any chance?

Currently using with the 2.4.1 release.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
honzakralcommented, Apr 17, 2017

yes, the limit is being ignored because reindex uses scan search to get all the documents, you’d have to craft a query that only matches N documents in your index.

But if you look into the implementation of reindex you see how easy it is, it is literally feeding search results into bulk, which you can do on your own with the limited search API call:

data = es_production.search(index='i', body={"size": 1000, "query": {...}})
bulk(es_sandbox, data['hits']['hits'])
0reactions
Battleroidcommented, Apr 17, 2017

Ah ok. Thank you.

On April 17, 2017 6:22:57 PM EDT, “Honza Král” notifications@github.com wrote:

yes, the limit is being ignored because reindex uses scan search to get all the documents, you’d have to craft a query that only matches N documents in your index.

But if you look into the implementation of reindex you see how easy it is, it is literally feeding search results into bulk, which you can do on your own with the limited search API call:

data = es_production.search(index='i', body={"size": 1000, "query":
{...}})
bulk(es_sandbox, data['hits']['hits'])

– Casey Weed

Read more comments on GitHub >

github_iconTop Results From Across the Web

3 best practices for using and troubleshooting the Reindex API
In this blog post you you will learn about the power of Reindex API and how to use those to run it confidently....
Read more >
Indexing | Rails - Algolia
Zero-downtime reindexing​​ This guarantees that your index is never empty but requires that your plan has enough record quota to hold the ...
Read more >
Elasticsearch integration - GitLab Docs
The Maximum Bulk Request size is used by the GitLab Golang-based indexer processes and indicates how much data it ought to collect (and...
Read more >
Operational best practices for Amazon OpenSearch Service
You must also configure logging thresholds—otherwise, CloudWatch won't capture ... Properly configured indexes can help boost overall domain performance.
Read more >
Specify Fill Factor for an Index - SQL Server - Microsoft Learn
A correctly chosen fill-factor value can reduce potential page ... will help to minimize page splits caused by extra length in the rows....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found