Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Fine-tuning a pre-trained model for classification

See original GitHub issue

Hi, Thanks a lot for the great SBERT, I wanted to add a softmax layer on top of one of the pre-trained models and build a classifier, but I saw this and thought maybe there is no option in updating the weight of pre-trained model; Is this true?

If not, I wrote a customized Dataset class and called model.tokeinze() in that, just like SentenceDataset. But when I built a dataset and pass it to a DataLoader I got the following error:
RuntimeError: stack expects each tensor to be equal size, but got [295] at entry 0 and [954] at entry 1 I wonder if I should call prepare_for_model after calling tokenize method or what?

Thanks in advance.

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:9 (3 by maintainers)

Top GitHub Comments

4reactions

aliosiacommented, Oct 12, 2020

Thanks a lot for your explanation @nreimers I will surely test the other way more, but in my first try, I got better results with SBERT features.

Also the idea of first training with Siamese networks (contrastive loss or triplet loss), in an unsupervised way, and then fine-tuning with the logistic loss for classification is not new, and I remember that near for two years (near 2015) the state of the art face classification model used both loss functions together. Hence, I think starting from a pre-trained network and fine-tuning with a classification loss seems reasonable.

4reactions

nreimerscommented, Oct 12, 2020

Hi @aliosia You usually get much better results, if you use directly Transformers and fine-tune it on your sentiment classification task.

I don’t know who brought this idea up in the community, but it was never a good idea to first map a sentence to an embedding and then using this embedding as (only) feature for a classifier like logistic regression. Classifier working directly on the text data always outperformed these sentence embedding -> classifier constructions.

So for your case I can recommend to fine tune directly for classification and to not use a sentence embedding in between.