question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Incomplete fulltext search results

See original GitHub issue

JabRef version

5.5 (latest release)

Operating system

Windows

Details on version and operating system

Windows 10 21H2

Checked with the latest development build

  • I made a backup of my libraries before testing the latest development version.
  • I have tested the latest development version and the problem persists

Steps to reproduce the behaviour

When using the fulltext search with a simple single-keyword query, e.g. test, I only get partial results and a subset of expected entries containing the text test is not displayed in the search results. When I open the JabRef’s Lucene index in Luke and execute the same query (content:test), it returns all related entries including those that are missing in JabRef’s search results.

The library in which I experience this has 400 entries. When I create a new library and add only one of the missing entries, the fulltext search returns it as expected. When I delete large portions (e.g. 350 entries) from my 400-entry library, that missing search result also starts to appear - this does not seem related to deleting a specific (potentially problematic) entry, as it starts to appear after different random selections of entries are removed. There’s also no specific threshold library size that triggers this behavior - I was able to make the result appear after cutting the library randomly down to ~40 - 70 entries.

Appendix

No response

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:13 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
btutcommented, Mar 31, 2022

Now the question is, is this limitation on purpose?

Yes it is and we had quite some discussion when implementing it. The problem here is twofold:

  1. We do not sort search-results by the lucene score, because for the metadata-search there is no lucene score. This means that for short queries that match a lot of entries, the fulltext-search would be good for nothing because one could not tell where the best hit is.
  2. It is also difficult to weight the importance of the metadata-fields and the fulltext results. In my opinion the metadata-results are more important. When allowing all fulltext-search results, the metadata-search results would be flooded by not-very-good fulltext-results.

I think both these issues can be solved by switching to lucene for all searches. Metadata-results can be weighted using lucene as they would be using the same querries and we can use the overall lucene score to sort the entry table. (My wish would then be to also change the display of the fulltext-search results and show them directly in the table instead of the tab in the entry editor.)

1reaction
protyposiscommented, Mar 30, 2022

Thank you, that helped me figure out the problem. A search string of e.g. test results in the parsed Lucene query path:test content:test pageNumber:test modified:test annotations:test at https://github.com/JabRef/jabref/blob/7d4916ead08e340c65dd956286ae22c44ea8cc48/src/main/java/org/jabref/logic/pdf/search/retrieval/PdfSearcher.java#L69-L70

The problem here is maxHits, which is hardcoded to 5 in the search rules, e.g. at https://github.com/JabRef/jabref/blob/7d4916ead08e340c65dd956286ae22c44ea8cc48/src/main/java/org/jabref/model/search/rules/ContainBasedSearchRule.java#L97

I haven’t worked with Lucene in a long while, but it seems to me that the limit applies to each field separately, so the parsed query from above can yield 25 entries at most. Usual text queries don’t match the pageNumber or modified fields, yielding 15 results max, which I can also confirm from my testing.

Now the question is, is this limitation on purpose?

This certainly prevents me from using JabRef for my use-case: finding all relevant entries out of all (or a subgroup) of entries, that contain e.g. a specific keyword. Or more generically: doing fulltext-based literature research within a library. Currently this only allows to answer whether there is any or no relevant entry.

Read more comments on GitHub >

github_iconTop Results From Across the Web

sql server - Full text search can not find incomplete words
Full text search can not find incomplete words ... like this query : SELECT * FROM FREETEXTTABLE (Flags, FlagColors, 'Blu');. What is the...
Read more >
use full text search to search incomplete words in mysql
I have a problem in the search for a book from mysql database. For searching data in mysql we use full text search...
Read more >
Full Text Indexing Returns Incorrect or No Results
Full-text queries perform linguistic searches against text data in full-text indexes by operating on words and phrases based on the rules of a...
Read more >
Full-Text Searches in MySQL: The Good, the Bad and the Ugly
MySQL provides support for full-text indexing and searching. In this blog post, we discuss the advantages and disadvantages of utilizing the ...
Read more >
Fulltext-search with incomplete results
While investigating this problem I recognized that the tickets, which were not found contained the expression in the title only, while the other ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found