Electra loss computation
See original GitHub issueHello - I looked at simpletransformers/language_modeling/language_modeling_model.py
and it seems the loss computation for Electra does not take into account the discriminator loss.
On line 524 we have loss = outputs[0]
whereas in line 470 of /simpletransformers/custom_models/models.py
we are returning g_loss, d_loss, g_scores, d_scores, d_labels
.
This seems as though only the generator loss is being optimized.
In the paper (https://arxiv.org/pdf/2003.10555.pdf) the authors combine the NLL loss from the generator and BCE loss from the discriminator (top of page 4).
Am I missing something?
Issue Analytics
- State:
- Created 3 years ago
- Comments:29 (14 by maintainers)
Top Results From Across the Web
ELECTRA - Hugging Face
ELECTRA is a new pretraining approach which trains two transformer ... FloatTensor of shape (1,) ) — Total loss of the ELECTRA objective....
Read more >Loss of base and large models · Issue #3 - GitHub
Hi,. I'm currently working on a new non-English ELECTRA model. Training on GPU seems to work and is running fine.
Read more >Learning to Sample Replacements for ... - ACL Anthology
Notice that Equation (1) uses the actual discrim- inator loss LD(x ,c), which can not be obtained without feeding xR into the discriminator....
Read more >Understanding ELECTRA and Training an ELECTRA ...
This clearly shows that the ability to calculate the loss over all input tokens significantly boosts the performance of a pre-trained model.
Read more >Why I created my own Electra model with memory-efficient ...
Loss over only masked tokens - BERT masks out 15% of the input ... The ELECTRA model trained on GPU for 4 days...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Sure
Hi - so huggingface already ties input and output embeddings of a language model.
All we need to do is tie the input embeddings of the generator and discriminator (the discriminator has no output embeddings).
The code is simple and straightforward: