question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pooler weights not being updated for Multiple Choice models?

See original GitHub issue

I’m trying use pretrained BERT to finetune on a multiple choice dataset.

The parameters from pooler are excluded from the optimizer params here, however, the MutlipleChoice model does indeed use pooled_output (which passes through the pooler) here.

I wasn’t able to find a similar exclusion of pooler params from the optimizer in the official repo. I think I’m missing something here. Thanks for your patience.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
thomwolfcommented, Apr 11, 2019

Indeed this looks like a bug in the run_swag.py example. What do you think @rodgzilla? Isn’t the exclusion of the pooler parameters from optimization (line 392 of run_swag.py) a typo?

0reactions
meetpscommented, Jun 12, 2019

Fixed in #675.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pooler weights not being updated for Multiple Choice models?
I'm trying use pretrained BERT to finetune on a multiple choice dataset. The parameters from pooler are excluded from the optimizer params ...
Read more >
Source code for transformers.modeling_bert - Hugging Face
Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the :meth:`~transformers.PreTrainedModel.
Read more >
Model weights not being updated - PyTorch Forums
Everything is working fine, EXCEPT the update bit of the weights. The update method is being called in a train_loop function that calls...
Read more >
Advanced Techniques for Fine-tuning Transformers
Learn these techniques for fine-tuning BERT, RoBERTa, etc. Layer-wise Learning Rate Decay (LLRD) Warm-up Steps Re-initializing Layers ...
Read more >
Understanding text with BERT - Scaleway's Blog
Here we are going to look at a new language representation model called ... BERT layers are not frozen, and their weights are...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found