Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cross-Encoder outputs values greater than 1.0

See original GitHub issue

According to https://sbert.net/examples/applications/retrieve_rerank/README.html#re-ranker-cross-encoder the cross encoder “outputs a single score between 0 and 1”. I do get these results with some underlying models: (transformers==4.6.1, sentence-transformers==1.2.0)

cross_encoder_model = CrossEncoder('cross-encoder/stsb-TinyBERT-L-4')  # 0.95060706

But for:

cross_encoder_model = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-4-v2')
query_anchor_pairs = [
    ['i got the client VERY drunk', 'i got the client drunk'],
]
ce_scores = cross_encoder_model.predict(query_anchor_pairs)
print(ce_scores)  # 8.14593

I get scores much greater than 1. What am I doing wrong? Btw, I am getting very good results with either one, especially the latter (Retrieve & Re-Rank). Thank you, Nils.

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

nreimerscommented, Jun 15, 2021

Hi @niebb Thanks for pointing this out. The documentation is there outdated.

Previously, a sigmoid was applied on top of the logits score, i.e. the output was sigmoid(logits). This gives scores between 0 and 1.

The new cross-encoders for msmarco output the logits directly, hence, they can be below 0 or above 1. For re-ranking, this does make any difference. If you like, you can call sigmoid() on top of these values to get back a score 0…1

But as mentioned, for re-ranking it does not make any difference.

0reactions

nreimerscommented, Aug 13, 2021

You can check the MSMARCO examples for the cross-encoder.

In principle you need some pairs (query, passage) with labels 0: not relevant and 1: relevant.