Same model & data, similarity scores changed
See original GitHub issueHi there,
I’ve just upgraded to version 0.3.8 from 0.3.2. After the update, I noticed that running the same notebook I was working on, which finds the most similar texts to predefined queries, returned different rankings than before (e.g. one sentence that had similarity score about 0.9 wrt the query - and was therefore ranked as the most similar text in the corpus - is now ranked second with a score of about 0.6).
I am loading the same corpus, using the same text queries and the same model (distiluse-base-multilingual-cased
) as before.
Any idea what may have caused this change? I realize that several releases occurred in between so it may be close to impossible to answer, but if you have any clue I’d be curious to know. Thanks in advance!
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (6 by maintainers)
Top Results From Across the Web
A Practitioner's Guide to Similarity Scoring, Part 1
The aim of similarity scoring is to create a function that takes a pair of objects and produces a numerical score quantifying their...
Read more >Similarity Score - an overview | ScienceDirect Topics
Every time two clusters are merged, similarity scores between clusters need to be recomputed to find the new closest cluster pairs. Specifically, merging...
Read more >Similarity Measures: Check Your Understanding
How does similarity between music videos change? Popular videos become less similar than less popular videos.
Read more >Why do my similarity values change after each running with ...
... with the same training data, it won't be presented to the model in exactly the same order between runs, changing the final...
Read more >Using residualized change versus difference scores for ...
In contrast, when we fit the difference score model to the same data, less than 1% of the variance in the outcome is...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hi, I uploaded the old version as
distiluse-base-multilingual-case-v1
, which is a version that supports (only) 15 languages.The version that supports 50+ languages is uploaded as
distiluse-base-multilingual-case-v2
.If you are curious and want to look into this, this is a few lines of code demonstrating that embeddings do change when switching model versions:
resulting in:
🤔