Performance of reindexing
See original GitHub issueHello,
first off - very nice tool 😄 I’ve played around today a bit with this crawler (combined with Tika + OCR tesseract).
The initial indexing of 1,5GB (8.000 files) took a while - which is fine of course.
My main problem is currently, that the “reindexing” takes more time then i thought. For those 8.000 files it took about 2 minutes.
Is there any possibility to speed up that part? Configuration or similar?
Does it currently compare the file modification
timestamp with the lastrun
timestamp? Or is it another approach?
Thanks in advance for any information 👍
Issue Analytics
- State:
- Created 7 years ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
Tips to Improve your Elasticsearch Reindex Performance
Improve your Elasticsearch Reindex Performance with these Tips ... Disable replicas when building a new index from scratch that is not serving the...
Read more >How to reindex over 120M documents in one hour at Compass
Here, we will focus on the first two, where you should expect to see a decent reindexing performance boost without additional a hardware...
Read more >How reindexing/rebalancing works, and the impact on ...
Performance Impact. Reindexing is a resource-intensive operation, as it uses both CPU and disk bandwidth. The CPU will be busy parsing the ...
Read more >Reindexing Performance - Elasticsearch - Elastic Discuss
Hi,. We are having problems with reindexing our data. Can you advice what to do on how to reindex while not affecting search...
Read more >How to Reindex One Billion Documents in One Hour at ...
The shown performance improvements helped to cut down the reindexing time for new clusters from one week to one hour, thereby enabling the ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Some ideas releated to this topic.
tika-server
- Just provide an REST/HTTP API for it.With this split, it’s possible to create a general crawler for all systems and specialized ones for unix/windows.
Not for now. So basically you would like to be able to read any parameter either from settings or from the command line.
That makes sense to me. Can you open an issue for that?