Loading mBART Large 50 MMT (many-to-many) is slow
See original GitHub issueEnvironment info
I’m installing the library directly from master
and running it in a kaggle notebook.
transformers
version: 4.4.0.dev0- Platform: Linux-5.4.89±x86_64-with-debian-buster-sid
- Python version: 3.7.9
- PyTorch version (GPU?): 1.7.0 (False)
- Tensorflow version (GPU?): 2.4.1 (False)
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No
Who can help
Information
Model I am using (Bert, XLNet …): mBART-Large 50 MMT (many-to-many)
The problem arises when using:
- the official example scripts: (give details below)
- my own modified scripts: (give details below)
After caching the weights of the model, load it with from_pretrained
is significantly slower compared with torch.load
.
The tasks I am working on is:
- an official GLUE/SQUaD task: (give the name)
- my own task or dataset: (give details below)
Machine Translation
To reproduce
Here’s the kaggle notebook reproducing the issue. Here’s a colab notebook showing essentially the same thing.
Steps to reproduce the behavior:
- Load model with
model = MBartForConditionalGeneration.load_pretrained("facebook/mbart-large-50-many-to-many-mmt")
- Save model with
model.save_pretrained('./my-model')
- Save model with
torch.save(model, 'model.pt')
- Reload and time with
MBartForConditionalGeneration.load_pretrained('./my-model')
- Load with
torch.load('model.pt')
The step above can be reproduced inside a kaggle notebook:
model = MBartForConditionalGeneration.load_pretrained("facebook/mbart-large-50-many-to-many-mmt")
model.save_pretrained('./my-model/')
torch.save(model, 'model.pt')
%time model = MBartForConditionalGeneration.from_pretrained("./my-model/")
%time torch_model = torch.load('model.pt')
We will notice that loading with from_pretrained
(step 4) is significantly slower than torch.load
(step 5); the former takes over 1 minute and the latter just a few seconds (or around 20s if it hasn’t been previously loaded in memory; see notebook).
Expected behavior
The model should take less than 1 minute to load if it has already been cached (see step 1)
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
Related: https://github.com/huggingface/transformers/issues/9205
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.