Indexed documents vs crawled documents
See original GitHub issueHi,
Any idea why i have only a small number of indexed documents compared to crawled documents. Fess is showing i have 200k+ documents but in reality i only have 12k+ in my index that i can search. I cannot search for all 200k documents.
The crawling job has finished and there is nothing else happening. How can i check what happened to the other documents, i have tried the logs already. At this point this is the biggest issue i`m facing , how can i get that number closer to the number of crawled documents
.
Issue Analytics
- State:
- Created 7 years ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
Crawling and Indexing - Google
As the search appliance crawls public content sources, it indexes documents that it finds. To find more documents, the crawler follows links within...
Read more >What is Crawling and Indexing?
Crawling is the discovery of pages and links that lead to more pages. Indexing is storing, analyzing, and organizing the content and connections ......
Read more >In-Depth Guide to How Google Search Works | Documentation
Get an in-depth understanding of how Google Search works and improve your site for Google's crawling, indexing, and ranking processes.
Read more >Crawling and Indexing Your Search Collection - IBM
The Watson™ Explorer Engine crawls and indexes the documents in a search collection in order to be able to quickly and flexibly search...
Read more >Crawling and Indexing Content Sources
Crawling is straightforward, it's indexing that demands engineering skills and lots of innovation and creativity. Consider these four crawled documents.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
JVM options for Crawler are in fess_config.properties:
To do a remote debug, in Admin > System > Scheduler > Default Crawler, change script to
and also change a log level to “debug”.