Not clear which layers are being trained
See original GitHub issueIn your paper, you state “The purpose of SBERT sentence embeddings are not to be used for transfer learning for other tasks. Here, we think fine-tuning BERT as described by Devlin et al. (2018) for new tasks is the more suitable method, as it updates all layers of the BERT network.”
The above is implying that the SentenceTransformer.fit()
method does not update all layers of BERT (or XLNet, etc.) . However, there is no code in this repo that partially freezes any layers when training or anything like that. Could you kindly clarify how you’re controlling which layers are being trained (other than the parameters in the loss modules, which are clearly being trained) ? Sorry if this is a stupid question.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:6 (2 by maintainers)
Top Results From Across the Web
Visualizing the "fit, unfreeze, fit again" process. Which layers ...
I'm going through Lessons 2 and 3. Having trouble visualizing what goes on during the “fit, unfreeze, fit again” process. Which layers are...
Read more >Transfer learning and fine-tuning | TensorFlow Core
Train the new layers on your dataset. A last, optional step, is fine-tuning, which consists of unfreezing the entire model you obtained ...
Read more >How to fix some layers for transfer learning? #1706 - GitHub
One thing I tried in your Colab is to freeze all layers except the last one, then run a single training step in...
Read more >DL Assignment 2 – Weights & Biases - WandB
The goal of this assignment is threefold: (i) train a CNN model from scratch ... layer that is training only last layer and...
Read more >How the pytorch freeze network in some layers, only the rest of ...
Freezing intermediate layers while training top and bottom layers ... The basic idea is that all models have a function model.children() ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@peacej The statement was in reference to sentence classification tasks, like sentiment classification. These tasks are evaluated for example with SentEval.
There (in SentEval), the assumption is: Sentencen —Fixed sentence embedding method —> Vector — Log reg. --> Label
In that SentEval scenario, only the logistic regression classifier is updated. The sentence embedding method is fixed.
If you have a sentence classification task, like sentiment classification, I think this setup doesn’t make sense. There, it is better to fine-tune bert: Sentence ---- BERT —> Label
Sentence embedding methods make sense for unsupervised tasks and tasks where you use cosine similarity / manhatten- / euclidian-distance for retrieval etc.
I hope this makes it a bit more clear.
Best Nils Reimers
@peacej I have a similar situation with a specialized domain. What I do is a two-piece training process:
1 - Finetune BERT in an unsupervised fashion on my domain text (via Huggingface’s transformers repo) 2 - Fit a SentenceTransformer on labeled pairs, using the weights from 1 as the base model.