question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Electra Models have bad performance

See original GitHub issue

Hi, this issue is in connection with #193

I trained german sentence transformers on our brand new german electra model german-nlp-group/electra-base-german-uncased (1) (2) but have relativly bad results compared to other german models. But we showed that our new electra model is better on downstream tasks (germeval18 and 17).

I train on a german xnli dataset. dbmdz/bert-base-german-uncased reaches 0.731166 COSINE spearman while the electra model barely reaches 0.6 - both are optimized with optuna over several steps.

What might be the reason? Do you have a theory? AFAIK the last layer of BERT and Electra are identical. So I have no idea why it differs so much.

(1): through a bug it does not show up on HF yet - see here https://github.com/huggingface/transformers/issues/6495 (2): model card is here: https://github.com/German-NLP-Group/german-transformer-training/blob/master/model_cards/electra-base-german-uncased.md

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:10 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
nreimerscommented, Aug 26, 2020

Hi @PhilipMay I think sampling for 1) is quite important. Out-of-the-box, with round-robin, the results are rather bad.

I would like to join, but I am afraid that I do not really have time. I am currently working on several other research projects, so I sadly don’t have time for further projects.

But looking forward to your experiences.

1reaction
nreimerscommented, Aug 24, 2020

Hi @PhilipMay I haven’t evaluated what is better: 1) multi-task training on NLI+STS or 2) first NLI and then STS training.

This script does 1), while I personally use 2) in my experiment. But I never evaluated what approach is better. https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/other/training_multi-task.py

Edit: As the dataset sizes between NLI and STS are extremely different, I found that 2) makes a bit more sense. Otherwise, for 1), you need to figure out how to deal with the different dataset sizes. Currently, it does round-robin, i.e. NLI and STS get both 50%. But this could quickly lead to an overfitting on the small STS training set while for NLI, you have only seen a fraction of the samples.

Out-of-the box, the NLI+STS multi task setup leads to worse scores than approach 2) where you first fine tune on NLI, and then on STS.

Read more comments on GitHub >

github_iconTop Results From Across the Web

ELECTRA training reimplementation and discussion - Research
As we can see, although ELECTRA is mocking adeversarial training, it has a good training stability.
Read more >
More Efficient NLP Model Pre-training with ELECTRA
ELECTRA's excellent efficiency means it works well even at small scale — it can be trained in a few days on a single...
Read more >
ELECTRA - How to Train BERT 4x Cheaper
Training costs is important part of machine learning production as transformer language models get bigger. ELECTRA model is being adopted by ...
Read more >
Commonsense knowledge adversarial dataset that ... - arXiv
QADS, was not as bad as ELECTRA. The result shows that even top-performing NLP models have little ability to handle commonsense knowledge which...
Read more >
ELECTRA is a Zero-Shot Learner, Too - arXiv Vanity
Experimental results show that ELECTRA model based on RTD-prompt learning achieves surprisingly state-of-the-art zero-shot performance.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found