Fine-tuning a pre-trained model for classification
See original GitHub issueHi, Thanks a lot for the great SBERT, I wanted to add a softmax layer on top of one of the pre-trained models and build a classifier, but I saw this and thought maybe there is no option in updating the weight of pre-trained model; Is this true?
If not, I wrote a customized Dataset class and called model.tokeinze() in that, just like SentenceDataset. But when I built a dataset and pass it to a DataLoader I got the following error:
RuntimeError: stack expects each tensor to be equal size, but got [295] at entry 0 and [954] at entry 1
I wonder if I should call prepare_for_model after calling tokenize method or what?
Thanks in advance.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:9 (3 by maintainers)
Top Results From Across the Web
Fine-tune a pretrained model - Hugging Face
When you use a pretrained model, you train it on a dataset specific to your task. This is known as fine-tuning, an incredibly...
Read more >Transfer learning and fine-tuning | TensorFlow Core
A pre-trained model is a saved network that was previously trained on a large dataset, typically on a large-scale image-classification task.
Read more >Transfer Learning | Pretrained Models in Deep Learning
Ways to fine tune your model; Use the pre-trained model for ... The objective was to classify the images into one of the...
Read more >Fine-tuning pretrained NLP models with Huggingface's Trainer
A simple way to fine-tune pretrained NLP models without native Pytorch or Tensorflow · Intermediate understanding of Python · Basic understanding ...
Read more >Finetuning Torchvision Models - PyTorch
In finetuning, we start with a pretrained model and update all of the model's parameters for our new task, in essence retraining the...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Thanks a lot for your explanation @nreimers I will surely test the other way more, but in my first try, I got better results with SBERT features.
Also the idea of first training with Siamese networks (contrastive loss or triplet loss), in an unsupervised way, and then fine-tuning with the logistic loss for classification is not new, and I remember that near for two years (near 2015) the state of the art face classification model used both loss functions together. Hence, I think starting from a pre-trained network and fine-tuning with a classification loss seems reasonable.
Hi @aliosia You usually get much better results, if you use directly Transformers and fine-tune it on your sentiment classification task.
I don’t know who brought this idea up in the community, but it was never a good idea to first map a sentence to an embedding and then using this embedding as (only) feature for a classifier like logistic regression. Classifier working directly on the text data always outperformed these sentence embedding -> classifier constructions.
So for your case I can recommend to fine tune directly for classification and to not use a sentence embedding in between.