How to freeze layers of bert?
See original GitHub issueHow to freeze all layers of Bert and just train task-based layers during the fine-tuning process?
We can do it by setting the requires_grad=false
for all layers In pytorch-pretrained-BERT. But is there any way in tensorflow code?
I added below code to create_optimizer function in optimization.py
tvars = tf.trainable_variables()
tvars = [v for v in tvars if 'bert' not in v.name] ## my code (freeze all layers of bert)
grads = tf.gradients(loss, tvars)
is that correct?
Issue Analytics
- State:
- Created 4 years ago
- Comments:15 (5 by maintainers)
Top Results From Across the Web
how to freeze bert model and just train a classifier? · Issue #400
Hi the BERT models are regular PyTorch models, you can just use the usual way we freeze layers in PyTorch. For example you...
Read more >How many layers of my BERT model should I freeze? ❄️
Freezing layers means disabling gradient computation and backpropagation for the weights of these layers. This is a common technique in NLP ...
Read more >How to freeze some layers of BertModel - Hugging Face Forums
I have a pytorch model with BertModel as the main part and a custom head. I want to freeze the embedding layer and...
Read more >How to freeze some layers of BERT in fine tuning in tf2.keras
I found the answer and I share it here. Hope it can help others. By the help of this article, which is about...
Read more >Does BERT freeze the entire model body when it does fine ...
Does it freeze the weights that have been provided by the pre-trained model and only alter the top classification layer, or does it...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@shimafoolad
I don’t understand your question but check out my fork of BERT.
This is the part that makes sure only the layers added on top of BERT are updated during finetuning.
I’ve also written a script to compare the weights given two checkpoint files and print the weights that differ. I finetuned BERT on CoLA and compare the checkpoint files at step 0 and 267. As expected, only the weights associated with
output weights
andoutput_bias
are different:I hope this answers your question.
@hkvision Try finetuning with longer epochs and higher learning rate. I finetuned on the CoLA dataset using default hyperparameters and here is my results after 5 epochs:
This is what I get after 50 epochs and progressively increasing the learning rate by a few orders of magnitude:
@OYE93
Have a look at this line.
tvars
now contains a list of all weights outside BERT. You will need to add to it the params from layer 11 onwards. You can check the checkpoint files for how these weights are named.