Documents with duplicate ids are being added normally
See original GitHub issueHello. Thank you for your hard work on this package!
I think I may have encountered a problem with the ways ids are treated. In particular, when adding two documents with the same id, both documents are added to the search pool. No error is raised, no upsert is being carried out - both documents with the same id are added normally as if they had different ids. I am not sure if that’s expected behavior.
The following example code:
const Minisearch = require('minisearch')
async function run() {
const minisearch = new Minisearch({
fields: ['value']
})
minisearch.add({ id: 'b', value: 'bob' })
minisearch.add({ id: 'b', value: 'boba' })
const ans = minisearch.search('bob', { fuzzy: true })
console.dir(ans, { depth: 4 })
}
run()
outputs:
[
{
id: 'b',
terms: [ 'bob' ],
score: 2.0794415416798357,
match: { bob: [ 'value' ] }
},
{
id: 'b',
terms: [ 'boba' ],
score: 0.6398773880082578,
match: { boba: [ 'value' ] }
}
]
For convenience, I have created a repository which reproduces the issue outlined in the example above.
Kind Regards, lilsweetcaligula
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (4 by maintainers)
Top Results From Across the Web
ID attribute values must be unique | Axe Rules | Deque Systems
Rename any duplicate ID attributes values. Duplicate IDs are common validation errors that may break the accessibility of labels, e.g., form fields, ...
Read more >Why are duplicate ID values not allowed in HTML?
It says that ID must be unique in its home subtree, which is basically the document if we read the definition of it....
Read more >Ditamap validation configuration Check for duplicate IDs
Duplicate IDs are a problem if we are creating conrefs that use an ID that happens to already exist in the referenced document....
Read more >Solved: Need help with duplicate ID's - ServiceNow Community
This has been resolved! I was able to run a script to merge any references to the duplicate ID's and then I deleted...
Read more >Duplicate IDs: Student Enterprise Systems
An individual has Duplicate EMPL IDs when he/she has two or more EMPL IDs in CAESAR. EMPL IDs created in CAESAR begin with...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @lilsweetcaligula , Thanks for reporting this. I agree that it would be better to throw an error if a document is added having an id that is already in the index. Unfortunately, it is tricky to implement without additionally saving a map of all encountered ids, which would increase memory utilization on large indexes.
Let me think if there is a simple way to fix this.
I also just came across this issue. Unfortunately just ignoring duplicates would not work, since the reason I am trying to readd the same document is, that I know the document itself changed. My callback is also triggered with all documents so just removing a single and readding is not possible.
I know that is more a userland problem, and I need to better identify the actual change that is happening, but thought I mention it of being the case.