Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to freeze layers of bert?

See original GitHub issue

How to freeze all layers of Bert and just train task-based layers during the fine-tuning process? We can do it by setting the requires_grad=false for all layers In pytorch-pretrained-BERT. But is there any way in tensorflow code? I added below code to create_optimizer function in optimization.py

tvars = tf.trainable_variables()
tvars = [v for v in tvars if 'bert' not in v.name]   ## my code (freeze all layers of bert)
grads = tf.gradients(loss, tvars)

is that correct?

Issue Analytics

State:
Created 4 years ago
Comments:15 (5 by maintainers)

Top GitHub Comments

16reactions

hsm207commented, May 18, 2019

@shimafoolad

I don’t understand your question but check out my fork of BERT.

This is the part that makes sure only the layers added on top of BERT are updated during finetuning.

I’ve also written a script to compare the weights given two checkpoint files and print the weights that differ. I finetuned BERT on CoLA and compare the checkpoint files at step 0 and 267. As expected, only the weights associated with output weights and output_bias are different:

I hope this answers your question.

@hkvision Try finetuning with longer epochs and higher learning rate. I finetuned on the CoLA dataset using default hyperparameters and here is my results after 5 epochs:

This is what I get after 50 epochs and progressively increasing the learning rate by a few orders of magnitude:

3reactions

hsm207commented, Sep 16, 2019

@OYE93

Have a look at this line.

tvars now contains a list of all weights outside BERT. You will need to add to it the params from layer 11 onwards. You can check the checkpoint files for how these weights are named.