question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to freeze layers of bert?

See original GitHub issue

How to freeze all layers of Bert and just train task-based layers during the fine-tuning process? We can do it by setting the requires_grad=false for all layers In pytorch-pretrained-BERT. But is there any way in tensorflow code? I added below code to create_optimizer function in optimization.py

tvars = tf.trainable_variables()
tvars = [v for v in tvars if 'bert' not in v.name]   ## my code (freeze all layers of bert)
grads = tf.gradients(loss, tvars)

is that correct?

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:15 (5 by maintainers)

github_iconTop GitHub Comments

16reactions
hsm207commented, May 18, 2019

@shimafoolad

I don’t understand your question but check out my fork of BERT.

This is the part that makes sure only the layers added on top of BERT are updated during finetuning.

I’ve also written a script to compare the weights given two checkpoint files and print the weights that differ. I finetuned BERT on CoLA and compare the checkpoint files at step 0 and 267. As expected, only the weights associated with output weights and output_bias are different:

image

I hope this answers your question.

@hkvision Try finetuning with longer epochs and higher learning rate. I finetuned on the CoLA dataset using default hyperparameters and here is my results after 5 epochs:

image

This is what I get after 50 epochs and progressively increasing the learning rate by a few orders of magnitude: image

3reactions
hsm207commented, Sep 16, 2019

@OYE93

Have a look at this line.

tvars now contains a list of all weights outside BERT. You will need to add to it the params from layer 11 onwards. You can check the checkpoint files for how these weights are named.

Read more comments on GitHub >

github_iconTop Results From Across the Web

how to freeze bert model and just train a classifier? · Issue #400
Hi the BERT models are regular PyTorch models, you can just use the usual way we freeze layers in PyTorch. For example you...
Read more >
How many layers of my BERT model should I freeze? ❄️
Freezing layers means disabling gradient computation and backpropagation for the weights of these layers. This is a common technique in NLP ...
Read more >
How to freeze some layers of BertModel - Hugging Face Forums
I have a pytorch model with BertModel as the main part and a custom head. I want to freeze the embedding layer and...
Read more >
How to freeze some layers of BERT in fine tuning in tf2.keras
I found the answer and I share it here. Hope it can help others. By the help of this article, which is about...
Read more >
Does BERT freeze the entire model body when it does fine ...
Does it freeze the weights that have been provided by the pre-trained model and only alter the top classification layer, or does it...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found