question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Fine-tuning a pre-trained model for classification

See original GitHub issue

Hi, Thanks a lot for the great SBERT, I wanted to add a softmax layer on top of one of the pre-trained models and build a classifier, but I saw this and thought maybe there is no option in updating the weight of pre-trained model; Is this true?

If not, I wrote a customized Dataset class and called model.tokeinze() in that, just like SentenceDataset. But when I built a dataset and pass it to a DataLoader I got the following error:
RuntimeError: stack expects each tensor to be equal size, but got [295] at entry 0 and [954] at entry 1 I wonder if I should call prepare_for_model after calling tokenize method or what?

Thanks in advance.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:1
  • Comments:9 (3 by maintainers)

github_iconTop GitHub Comments

4reactions
aliosiacommented, Oct 12, 2020

Thanks a lot for your explanation @nreimers I will surely test the other way more, but in my first try, I got better results with SBERT features.

Also the idea of first training with Siamese networks (contrastive loss or triplet loss), in an unsupervised way, and then fine-tuning with the logistic loss for classification is not new, and I remember that near for two years (near 2015) the state of the art face classification model used both loss functions together. Hence, I think starting from a pre-trained network and fine-tuning with a classification loss seems reasonable.

4reactions
nreimerscommented, Oct 12, 2020

Hi @aliosia You usually get much better results, if you use directly Transformers and fine-tune it on your sentiment classification task.

I don’t know who brought this idea up in the community, but it was never a good idea to first map a sentence to an embedding and then using this embedding as (only) feature for a classifier like logistic regression. Classifier working directly on the text data always outperformed these sentence embedding -> classifier constructions.

So for your case I can recommend to fine tune directly for classification and to not use a sentence embedding in between.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Fine-tune a pretrained model - Hugging Face
When you use a pretrained model, you train it on a dataset specific to your task. This is known as fine-tuning, an incredibly...
Read more >
Transfer learning and fine-tuning | TensorFlow Core
A pre-trained model is a saved network that was previously trained on a large dataset, typically on a large-scale image-classification task.
Read more >
Transfer Learning | Pretrained Models in Deep Learning
Ways to fine tune your model; Use the pre-trained model for ... The objective was to classify the images into one of the...
Read more >
Fine-tuning pretrained NLP models with Huggingface's Trainer
A simple way to fine-tune pretrained NLP models without native Pytorch or Tensorflow · Intermediate understanding of Python · Basic understanding ...
Read more >
Finetuning Torchvision Models - PyTorch
In finetuning, we start with a pretrained model and update all of the model's parameters for our new task, in essence retraining the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found