Pooler weights not being updated for Multiple Choice models?
See original GitHub issueI’m trying use pretrained BERT to finetune on a multiple choice dataset.
The parameters from pooler are excluded from the optimizer params here, however, the MutlipleChoice model does indeed use pooled_output (which passes through the pooler) here.
I wasn’t able to find a similar exclusion of pooler params from the optimizer in the official repo. I think I’m missing something here. Thanks for your patience.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:5 (4 by maintainers)
Top Results From Across the Web
Pooler weights not being updated for Multiple Choice models?
I'm trying use pretrained BERT to finetune on a multiple choice dataset. The parameters from pooler are excluded from the optimizer params ...
Read more >Source code for transformers.modeling_bert - Hugging Face
Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the :meth:`~transformers.PreTrainedModel.
Read more >Model weights not being updated - PyTorch Forums
Everything is working fine, EXCEPT the update bit of the weights. The update method is being called in a train_loop function that calls...
Read more >Advanced Techniques for Fine-tuning Transformers
Learn these techniques for fine-tuning BERT, RoBERTa, etc. Layer-wise Learning Rate Decay (LLRD) Warm-up Steps Re-initializing Layers ...
Read more >Understanding text with BERT - Scaleway's Blog
Here we are going to look at a new language representation model called ... BERT layers are not frozen, and their weights are...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

Indeed this looks like a bug in the
run_swag.pyexample. What do you think @rodgzilla? Isn’t the exclusion of the pooler parameters from optimization (line 392 ofrun_swag.py) a typo?Fixed in #675.