FunnelTransformerForSequenceClassification crashes when fine tuning with mixed precision flag
See original GitHub issueEnvironment info
transformers
version: 3.2.0- Platform: Linux-4.15.0-45-generic-x86_64-with-debian-buster-sid
- Python version: Python 3.7.7
- PyTorch version (GPU?): 1.5.1 (True)
- Tensorflow version (GPU?): 2.2.0 (True)
- Using GPU in script?: True
- Using distributed or parallel set-up in script?: No
Who can help
@sgugger As I saw you were the one who worked on the PR implementing Funnel Transformer
Information
Model I am using: Funnel Transformer
The problem arises when using:
- [ o ] the official example scripts: (give details below)
- [ x ] my own modified scripts:
Only when enabling the mixed precision flag. I am now training the model without it, but I had to lower the batch size, thus increasing the training time.
I have to mention that I just fined tuned a
roberta-base
model usingfp16=True
andfp16_opt_level='O1'
, thus nvidia APEX is properly installed/configured.
The tasks I am working on is:
- [ o ] an official GLUE/SQUaD task: (give the name)
- [ x ] my own task or dataset:
Basically I am trying to fine tune
FunnelForSequenceClassification
using my own custom data-set:
# some code to load data from CSV
# ...
# wrapper around PyTorch for holding datasets
class IMDbDataset(torch.utils.data.Dataset):
# same code as in the Huggingface docs
# ...
# load tokenizer
tokenizer = FunnelTokenizer.from_pretrained('funnel-transformer/large-base')
# tokenize texts
train_encodings = tokenizer(train_texts, truncation=True, padding=True)
val_encodings = tokenizer(val_texts, truncation=True, padding=True)
test_encodings = tokenizer(test_texts, truncation=True, padding=True)
train_dataset = IMDbDataset(train_encodings, train_labels)
val_dataset = IMDbDataset(val_encodings, val_labels)
test_dataset = IMDbDataset(test_encodings, test_labels)
# training args used
training_args = TrainingArguments(
output_dir='./results', # output directory
num_train_epochs=3, # total number of training epochs
per_device_train_batch_size=16, # batch size per device during training
per_device_eval_batch_size=64, # batch size for evaluation
#learning_rate=35e-6,
weight_decay=0.01, # strength of weight decay
warmup_steps=500, # number of warmup steps for learning rate scheduler
logging_dir='./logs', # directory for storing logs
logging_steps=10,
fp16=True,
fp16_opt_level='O1' # here I tried both O1 and O2 with the same result
)
model = FunnelForSequenceClassification.from_pretrained('funnel-transformer/large-base',
return_dict=True,
num_labels=max(train_labels)+1)
trainer = Trainer(
model=model, # the instantiated 🤗 Transformers model to be trained
args=training_args, # training arguments, defined above
train_dataset=train_dataset, # training dataset
eval_dataset=val_dataset # evaluation dataset
)
trainer.train()
trainer.save_model('funnel')
To reproduce
Steps to reproduce the behavior:
- Run script
- Wait for script to reach the training part
Stacktrace:
File "funnel.py", line 89, in <module>
trainer.train()
File "/root/anaconda/envs/ai/lib/python3.7/site-packages/transformers/trainer.py", line 741, in train
tr_loss += self.training_step(model, inputs)
File "/root/anaconda/envs/ai/lib/python3.7/site-packages/transformers/trainer.py", line 1046, in training_step
loss = self.compute_loss(model, inputs)
File "/root/anaconda/envs/ai/lib/python3.7/site-packages/transformers/trainer.py", line 1070, in compute_loss
outputs = model(**inputs)
File "/root/anaconda/envs/ai/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/root/anaconda/envs/ai/lib/python3.7/site-packages/transformers/modeling_funnel.py", line 1263, in forward
return_dict=return_dict,
File "/root/anaconda/envs/ai/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/root/anaconda/envs/ai/lib/python3.7/site-packages/transformers/modeling_funnel.py", line 950, in forward
return_dict=return_dict,
File "/root/anaconda/envs/ai/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/root/anaconda/envs/ai/lib/python3.7/site-packages/transformers/modeling_funnel.py", line 655, in forward
layer_output = layer(query, key, value, attention_inputs, output_attentions=output_attentions)
File "/root/anaconda/envs/ai/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/root/anaconda/envs/ai/lib/python3.7/site-packages/transformers/modeling_funnel.py", line 602, in forward
attn = self.attention(query, key, value, attention_inputs, output_attentions=output_attentions)
File "/root/anaconda/envs/ai/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/root/anaconda/envs/ai/lib/python3.7/site-packages/transformers/modeling_funnel.py", line 548, in forward
content_score = torch.einsum("bind,bjnd->bnij", q_head + r_w_bias, k_head)
File "/root/anaconda/envs/ai/lib/python3.7/site-packages/torch/functional.py", line 292, in einsum
return _VF.einsum(equation, operands)
RuntimeError: Expected object of scalar type Float but got scalar type Half for argument #2 'mat2' in call to _th_bmm
This seems like a very similar issue.
Expected behavior
We should be able to train the model with mixed precision to use VRAM more efficiently.
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (3 by maintainers)
Top Results From Across the Web
Training and fine-tuning — transformers 3.5.0 documentation
In this quickstart, we will show how to fine-tune (or train from scratch) a ... models with features like mixed precision and easy...
Read more >Fine-Tuning can Distort Pretrained Features and ...
When transferring a pretrained model to a downstream task, two popular methods are full fine-tuning (updating all the model parameters) and linear probing ......
Read more >Out of Memory Crash Triggered by lr_find and fine tune for ...
Am I the only one finding hard to use the lr_find method because it keeps on crashing? Some context: fastai version: 2.5.2 and...
Read more >Bootstrap from a pre-trained model
You can fine-tune pre-trained model checkpoints by using the --checkpoint_dir flag. ... as they do not use automatic mixed precision training.
Read more >BERT for TensorFlow2 - NVIDIA NGC
This model is trained with mixed precision using Tensor Cores on Volta, Turing, ... and then using this pre-trained model for fine-tuning for...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Good thing to know I don’t have to build APEX next time 😉
I just pulled the latest commit from your branch and can confirm loss is no longer
nan
.Great job and thanks for assistance!
I have found the reason (and why I wasn’t managing to fine-tune a model on some GLUE task yesterday). Turns out I was matching exactly the implementation of the authors but in transformers, we put 1 in attentions masks for tokens not masked… stupid me.