Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Errors in inference cross-encoders

See original GitHub issue

I finetuned the cross encoders model using one of the huggingface model (link) on the sts dataset using your training script. I loaded the model using the command and it shows the following warning.

model = CrossEncoder('lordtt13/COVID-SciBERT', num_labels=1)

Some weights of the model checkpoint at lordtt13/COVID-SciBERT were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.decoder.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at lordtt13/COVID-SciBERT and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Now, when I use the model after training,

It is comparatively slow during inference time as compared to cross-encoder models provided by sentence-transformer.
It gives the following error for some longer input pairs. RuntimeError: The size of tensor a (535) must match the size of tensor b (512) at non-singleton dimension 1

Could you please tell why is this happening or if I am missing something?

Many thanks Iknoor

Issue Analytics

State:
Created 3 years ago
Comments:11 (5 by maintainers)

Top GitHub Comments

2reactions

nreimerscommented, Nov 24, 2020

Hi @iknoorjobs The Nboost models have the issue that they use 2 labels (relevant and not relevant). If you want to use this model as base, you need binary labels, i.e. int(0) and int(1) as labels.

I will release today improved cross-encoder models for MS Marco that

Are quicker than the nboost models
Achieve a better performance on MS Marco & TREC DL 2019 dataset
And only use a single output to indicate if query and passage are relevant

1reaction

nreimerscommented, Nov 24, 2020

@iknoorjobs The models are now online: https://github.com/UKPLab/sentence-transformers/tree/master/examples/applications/information-retrieval

Can your cross encoder training scripts be used to train the model if I have dataset with binary labels? Yes. The MS Marco dataset had only binary labels (relevant or not relevant), which were encoded as 1 and 0.

It will also work if you have more fine-grained labels, like 0, 0.5, 0.8 and 1

Top Results From Across the Web

Speeding up Cross-encoders for both Training and Inference

We present how we use cross-encoder transformers and attention masking for superior performance.

Cross-Encoders — Sentence-Transformers documentation

Bi-Encoders (see Computing Sentence Embeddings) are used whenever you need a sentence embedding in a vector space for efficient comparison. Applications are for ......

Paragraph-based Transformer Pre-training for Multi-Sentence ...

In this paper, we first show that popular pre-trained transformers perform poorly when used for fine-tuning on multi-candidate inference tasks.

SBERT: CROSS - ENCODER for Zero-Shot Classification ...

SBERT Sentence Transformers include Cross-Encoders for sentence pair score (Question and Answer - QA) and zero-shot classification tasks.

Some Practice for Improving the Search Results of E-commerce

from cross-encoders models or the gain from feature engineering, ... Here are we listed some other inference acceleration strategies:.