RoBERTa model problem
See original GitHub issueHi,
I was trying to fine tune the roberta model with my own task with the implementation from this fantastic repo. However I do encounter one significant problem.
I trained the model on Google Colab and saved it with
torch.save(model.state_dict(), f"/content/drive/My Drive/roberta/models/state_fold{i}")
and then load the model with
model.load_state_dict(torch.load(path, map_location='cpu'))
on my local machine, where the method extract_features
would just return the same output regardless the input.
I have been using a workaround by fix all the parameters of roberta when training and reload the roberta with
self.roberta = torch.hub.load('pytorch/fairseq', 'roberta.base')
after I load the state_dict, which fixed the issue but still kind of not satisfying as I cannot finetune the model but only the classification heads.
Issue Analytics
- State:
- Created 4 years ago
- Comments:13 (7 by maintainers)
Top GitHub Comments
Hmm, this is quite off topic, but that BPE code technically supports any language, since it’s byte-level, but most of the codes are English words so it would essentially be doing character-level modeling. We don’t have code released for creating your own BPE in this format, since the dictionary is borrowed from GPT-2. We are currently working on a multilingual version, but there is no expected date yet.
@3NFBAGDU Good question! A readme was recently added for pre-training RoBERTa:
https://github.com/pytorch/fairseq/blob/master/examples/roberta/README.pretraining.md
But one problem could be, that a previously built dictionary is downloaded and used, see this line:
You can’t use that dictionary for a non-English language 🤔
Maybe @myleott could give a hint how to create such a dictionary for another corpus/language 🤗