Indexing through symlinks
See original GitHub issueThis is initial playing, sorry if this is well-known. I make a robust04 directory and symlinked in my normal locations for the CD45-cr subcollections, but it didn’t work. It seems that the filesystem walker either doesn’t work over symlinks or crossing NFS boundaries:
$ cat log.robust04.pos+docvectors+rawdocs
2018-11-21 15:21:56,782 INFO [main] index.IndexCollection (IndexCollection.java:248) - DocumentCollection path: /Users/soboroff/robust04
2018-11-21 15:21:56,783 INFO [main] index.IndexCollection (IndexCollection.java:249) - Index path: lucene-index.robust04.pos+docvectors
2018-11-21 15:21:56,783 INFO [main] index.IndexCollection (IndexCollection.java:250) - CollectionClass: TrecCollection
2018-11-21 15:21:56,783 INFO [main] index.IndexCollection (IndexCollection.java:251) - Generator: JsoupGenerator
2018-11-21 15:21:56,783 INFO [main] index.IndexCollection (IndexCollection.java:252) - Threads: 16
2018-11-21 15:21:56,784 INFO [main] index.IndexCollection (IndexCollection.java:253) - Stemmer: porter
2018-11-21 15:21:56,784 INFO [main] index.IndexCollection (IndexCollection.java:254) - Keep stopwords? false
2018-11-21 15:21:56,784 INFO [main] index.IndexCollection (IndexCollection.java:255) - Store positions? true
2018-11-21 15:21:56,784 INFO [main] index.IndexCollection (IndexCollection.java:256) - Store docvectors? true
2018-11-21 15:21:56,784 INFO [main] index.IndexCollection (IndexCollection.java:257) - Store transformed docs? false
2018-11-21 15:21:56,784 INFO [main] index.IndexCollection (IndexCollection.java:258) - Store raw docs? true
2018-11-21 15:21:56,784 INFO [main] index.IndexCollection (IndexCollection.java:259) - Optimize (merge segments)? false
2018-11-21 15:21:56,784 INFO [main] index.IndexCollection (IndexCollection.java:260) - Whitelist: null
2018-11-21 15:21:56,799 INFO [main] index.IndexCollection (IndexCollection.java:291) - Starting indexer...
2018-11-21 15:21:57,117 INFO [main] index.IndexCollection (IndexCollection.java:314) - 4 files found in /Users/soboroff/robust04
2018-11-21 15:21:57,157 ERROR [pool-2-thread-1] index.IndexCollection$IndexerThread (IndexCollection.java:231) - pool-2-thread-1: Unexpected Exception:
java.io.FileNotFoundException: /Users/soboroff/robust04/FR94 (Is a directory)
at java.io.FileInputStream.open0(Native Method) ~[?:1.8.0_141]
at java.io.FileInputStream.open(FileInputStream.java:195) ~[?:1.8.0_141]
at java.io.FileInputStream.<init>(FileInputStream.java:138) ~[?:1.8.0_141]
at java.io.FileInputStream.<init>(FileInputStream.java:93) ~[?:1.8.0_141]
at java.io.FileReader.<init>(FileReader.java:58) ~[?:1.8.0_141]
at io.anserini.collection.TrecCollection$FileSegment.<init>(TrecCollection.java:83) ~[anserini-0.2.1-SNAPSHOT.jar:?]
at io.anserini.collection.TrecCollection.createFileSegment(TrecCollection.java:59) ~[anserini-0.2.1-SNAPSHOT.jar:?]
at io.anserini.collection.TrecCollection.createFileSegment(TrecCollection.java:43) ~[anserini-0.2.1-SNAPSHOT.jar:?]
at io.anserini.index.IndexCollection$IndexerThread.run(IndexCollection.java:187) [anserini-0.2.1-SNAPSHOT.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_141]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_141]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]
2018-11-21 15:21:57,157 ERROR [pool-2-thread-2] index.IndexCollection$IndexerThread (IndexCollection.java:231) - pool-2-thread-2: Unexpected Exception:
java.io.FileNotFoundException: /Users/soboroff/robust04/LATIMES (Is a directory)
at java.io.FileInputStream.open0(Native Method) ~[?:1.8.0_141]
at java.io.FileInputStream.open(FileInputStream.java:195) ~[?:1.8.0_141]
at java.io.FileInputStream.<init>(FileInputStream.java:138) ~[?:1.8.0_141]
at java.io.FileInputStream.<init>(FileInputStream.java:93) ~[?:1.8.0_141]
at java.io.FileReader.<init>(FileReader.java:58) ~[?:1.8.0_141]
at io.anserini.collection.TrecCollection$FileSegment.<init>(TrecCollection.java:83) ~[anserini-0.2.1-SNAPSHOT.jar:?]
at io.anserini.collection.TrecCollection.createFileSegment(TrecCollection.java:59) ~[anserini-0.2.1-SNAPSHOT.jar:?]
at io.anserini.collection.TrecCollection.createFileSegment(TrecCollection.java:43) ~[anserini-0.2.1-SNAPSHOT.jar:?]
at io.anserini.index.IndexCollection$IndexerThread.run(IndexCollection.java:187) [anserini-0.2.1-SNAPSHOT.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_141]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_141]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]
2018-11-21 15:21:57,157 ERROR [pool-2-thread-3] index.IndexCollection$IndexerThread (IndexCollection.java:231) - pool-2-thread-3: Unexpected Exception:
java.io.FileNotFoundException: /Users/soboroff/robust04/FT (Is a directory)
at java.io.FileInputStream.open0(Native Method) ~[?:1.8.0_141]
at java.io.FileInputStream.open(FileInputStream.java:195) ~[?:1.8.0_141]
at java.io.FileInputStream.<init>(FileInputStream.java:138) ~[?:1.8.0_141]
at java.io.FileInputStream.<init>(FileInputStream.java:93) ~[?:1.8.0_141]
at java.io.FileReader.<init>(FileReader.java:58) ~[?:1.8.0_141]
at io.anserini.collection.TrecCollection$FileSegment.<init>(TrecCollection.java:83) ~[anserini-0.2.1-SNAPSHOT.jar:?]
at io.anserini.collection.TrecCollection.createFileSegment(TrecCollection.java:59) ~[anserini-0.2.1-SNAPSHOT.jar:?]
at io.anserini.collection.TrecCollection.createFileSegment(TrecCollection.java:43) ~[anserini-0.2.1-SNAPSHOT.jar:?]
at io.anserini.index.IndexCollection$IndexerThread.run(IndexCollection.java:187) [anserini-0.2.1-SNAPSHOT.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_141]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_141]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]
2018-11-21 15:21:57,157 ERROR [pool-2-thread-4] index.IndexCollection$IndexerThread (IndexCollection.java:231) - pool-2-thread-4: Unexpected Exception:
java.io.FileNotFoundException: /Users/soboroff/robust04/FBIS (Is a directory)
at java.io.FileInputStream.open0(Native Method) ~[?:1.8.0_141]
at java.io.FileInputStream.open(FileInputStream.java:195) ~[?:1.8.0_141]
at java.io.FileInputStream.<init>(FileInputStream.java:138) ~[?:1.8.0_141]
at java.io.FileInputStream.<init>(FileInputStream.java:93) ~[?:1.8.0_141]
at java.io.FileReader.<init>(FileReader.java:58) ~[?:1.8.0_141]
at io.anserini.collection.TrecCollection$FileSegment.<init>(TrecCollection.java:83) ~[anserini-0.2.1-SNAPSHOT.jar:?]
at io.anserini.collection.TrecCollection.createFileSegment(TrecCollection.java:59) ~[anserini-0.2.1-SNAPSHOT.jar:?]
at io.anserini.collection.TrecCollection.createFileSegment(TrecCollection.java:43) ~[anserini-0.2.1-SNAPSHOT.jar:?]
at io.anserini.index.IndexCollection$IndexerThread.run(IndexCollection.java:187) [anserini-0.2.1-SNAPSHOT.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_141]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_141]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]
2018-11-21 15:21:57,187 INFO [main] index.IndexCollection (IndexCollection.java:359) - # Final Counter Values
2018-11-21 15:21:57,187 INFO [main] index.IndexCollection (IndexCollection.java:360) - indexed: 0
2018-11-21 15:21:57,187 INFO [main] index.IndexCollection (IndexCollection.java:361) - empty: 0
2018-11-21 15:21:57,188 INFO [main] index.IndexCollection (IndexCollection.java:362) - unindexed: 0
2018-11-21 15:21:57,188 INFO [main] index.IndexCollection (IndexCollection.java:363) - unindexable: 0
2018-11-21 15:21:57,188 INFO [main] index.IndexCollection (IndexCollection.java:364) - skipped: 0
2018-11-21 15:21:57,188 INFO [main] index.IndexCollection (IndexCollection.java:365) - errors: 0
2018-11-21 15:21:57,197 INFO [main] index.IndexCollection (IndexCollection.java:368) - Total 0 documents indexed in 00:00:00
Issue Analytics
- State:
- Created 5 years ago
- Comments:8 (3 by maintainers)
Top Results From Across the Web
Can Win10 search through symlink or directory junctions?
Except when using the search bar in File Explorer it can't seem to search the junctioned folder. Google has very little recent info...
Read more >Symbolic links - voidtools forum
I have few folders, which are symbolic links to different folder on the same drive. Is it possible to somehow find files in...
Read more >Indexing of Symlinks Not Working · Issue #267 · photoprism ...
Actual Result: No photos are indexed. The indexer does not seem to find any of the photos through the symlinks. Given my large...
Read more >Ignore symlinks in project (indexing, code-completion etc)
Then find "indexer.follows.symlinks" and disable it. You can use search-as-you-type there to find it. Check if that helps with the issue.
Read more >Symbolic Links – Soft Links – Symlinks - Ian! D. Allen
3 Creating and Listing SymlinksIndex up to index. You create a symbolic link using the -s option to the link command ln ,...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@isoboroff Yup, this is what does the right thing: https://github.com/castorini/Anserini/blob/master/src/main/java/io/anserini/collection/TrecCollection.java#L51
Confirming it detects 0 documents in the DTD, AUX/, and .C files in those trees. Maybe you knew that already.