Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Use electra with `from_pretrained` in transformers library

See original GitHub issue

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I’m always frustrated when […] We trained ElectraForSequenceClassification, but we tried to use this pertained model with transformers ElectraForSequenceClassification using .from_pretrained method which provided us with following warnings

Some weights of the model checkpoint at models/electra-base-generator-final were not used when initializing ElectraForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias', 'classifier.weight', 'classifier.bias']
- This IS expected if you are initializing ElectraForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing ElectraForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of ElectraForSequenceClassification were not initialized from the model checkpoint at models/electra-base-generator-final and are newly initialized: ['classifier.dense.weight', 'classifier.dense.bias', 'classifier.out_proj.weight', 'classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Describe the solution you’d like is there any way to convert this model without running training again?

Describe alternatives you’ve considered A clear and concise description of any alternative solutions or features you’ve considered. Can you provide some script or some hints on how it can be implemented?

Additional context Add any other context or screenshots about the feature request here.