Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

FunnelTransformerForSequenceClassification crashes when fine tuning with mixed precision flag

See original GitHub issue

Environment info

transformers version: 3.2.0
Platform: Linux-4.15.0-45-generic-x86_64-with-debian-buster-sid
Python version: Python 3.7.7
PyTorch version (GPU?): 1.5.1 (True)
Tensorflow version (GPU?): 2.2.0 (True)
Using GPU in script?: True
Using distributed or parallel set-up in script?: No

Who can help

@sgugger As I saw you were the one who worked on the PR implementing Funnel Transformer

Information

Model I am using: Funnel Transformer

The problem arises when using:

[ o ] the official example scripts: (give details below)
[ x ] my own modified scripts: Only when enabling the mixed precision flag. I am now training the model without it, but I had to lower the batch size, thus increasing the training time. I have to mention that I just fined tuned a roberta-base model using fp16=True and fp16_opt_level='O1', thus nvidia APEX is properly installed/configured.

The tasks I am working on is:

[ o ] an official GLUE/SQUaD task: (give the name)
[ x ] my own task or dataset: Basically I am trying to fine tune FunnelForSequenceClassification using my own custom data-set:

# some code to load data from CSV
# ...
# wrapper around PyTorch for holding datasets
class IMDbDataset(torch.utils.data.Dataset):
    # same code as in the Huggingface docs
    # ...

# load tokenizer
tokenizer = FunnelTokenizer.from_pretrained('funnel-transformer/large-base')

# tokenize texts
train_encodings = tokenizer(train_texts, truncation=True, padding=True)
val_encodings = tokenizer(val_texts, truncation=True, padding=True)
test_encodings = tokenizer(test_texts, truncation=True, padding=True)

train_dataset = IMDbDataset(train_encodings, train_labels)
val_dataset = IMDbDataset(val_encodings, val_labels)
test_dataset = IMDbDataset(test_encodings, test_labels)

# training args used
training_args = TrainingArguments(
    output_dir='./results',          # output directory
    num_train_epochs=3,              # total number of training epochs
    per_device_train_batch_size=16,  # batch size per device during training
    per_device_eval_batch_size=64,  # batch size for evaluation
    #learning_rate=35e-6,
    weight_decay=0.01,               # strength of weight decay
    warmup_steps=500,                # number of warmup steps for learning rate scheduler
    logging_dir='./logs',            # directory for storing logs
    logging_steps=10,
    fp16=True,
    fp16_opt_level='O1'       # here I tried both O1 and O2 with the same result
)

model = FunnelForSequenceClassification.from_pretrained('funnel-transformer/large-base',
                                                        return_dict=True,
                                                        num_labels=max(train_labels)+1)

trainer = Trainer(
    model=model,                         # the instantiated 🤗 Transformers model to be trained
    args=training_args,                  # training arguments, defined above
    train_dataset=train_dataset,         # training dataset
    eval_dataset=val_dataset             # evaluation dataset
)

trainer.train()

trainer.save_model('funnel')

To reproduce

Steps to reproduce the behavior:

Run script
Wait for script to reach the training part

Stacktrace:

  File "funnel.py", line 89, in <module>
    trainer.train()
  File "/root/anaconda/envs/ai/lib/python3.7/site-packages/transformers/trainer.py", line 741, in train
    tr_loss += self.training_step(model, inputs)
  File "/root/anaconda/envs/ai/lib/python3.7/site-packages/transformers/trainer.py", line 1046, in training_step
    loss = self.compute_loss(model, inputs)
  File "/root/anaconda/envs/ai/lib/python3.7/site-packages/transformers/trainer.py", line 1070, in compute_loss
    outputs = model(**inputs)
  File "/root/anaconda/envs/ai/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/anaconda/envs/ai/lib/python3.7/site-packages/transformers/modeling_funnel.py", line 1263, in forward
    return_dict=return_dict,
  File "/root/anaconda/envs/ai/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/anaconda/envs/ai/lib/python3.7/site-packages/transformers/modeling_funnel.py", line 950, in forward
    return_dict=return_dict,
  File "/root/anaconda/envs/ai/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/anaconda/envs/ai/lib/python3.7/site-packages/transformers/modeling_funnel.py", line 655, in forward
    layer_output = layer(query, key, value, attention_inputs, output_attentions=output_attentions)
  File "/root/anaconda/envs/ai/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/anaconda/envs/ai/lib/python3.7/site-packages/transformers/modeling_funnel.py", line 602, in forward
    attn = self.attention(query, key, value, attention_inputs, output_attentions=output_attentions)
  File "/root/anaconda/envs/ai/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/anaconda/envs/ai/lib/python3.7/site-packages/transformers/modeling_funnel.py", line 548, in forward
    content_score = torch.einsum("bind,bjnd->bnij", q_head + r_w_bias, k_head)
  File "/root/anaconda/envs/ai/lib/python3.7/site-packages/torch/functional.py", line 292, in einsum
    return _VF.einsum(equation, operands)
RuntimeError: Expected object of scalar type Float but got scalar type Half for argument #2 'mat2' in call to _th_bmm

This seems like a very similar issue.

Expected behavior

We should be able to train the model with mixed precision to use VRAM more efficiently.

Issue Analytics

State:
Created 3 years ago
Comments:8 (3 by maintainers)

Top GitHub Comments

1reaction

iAlex97commented, Sep 25, 2020

Good thing to know I don’t have to build APEX next time 😉

I just pulled the latest commit from your branch and can confirm loss is no longer nan.

Great job and thanks for assistance!

1reaction

sguggercommented, Sep 25, 2020

I have found the reason (and why I wasn’t managing to fine-tune a model on some GLUE task yesterday). Turns out I was matching exactly the implementation of the authors but in transformers, we put 1 in attentions masks for tokens not masked… stupid me.

Top Results From Across the Web

Training and fine-tuning — transformers 3.5.0 documentation

In this quickstart, we will show how to fine-tune (or train from scratch) a ... models with features like mixed precision and easy...

Fine-Tuning can Distort Pretrained Features and ...

When transferring a pretrained model to a downstream task, two popular methods are full fine-tuning (updating all the model parameters) and linear probing ......

Out of Memory Crash Triggered by lr_find and fine tune for ...

Am I the only one finding hard to use the lr_find method because it keeps on crashing? Some context: fastai version: 2.5.2 and...

Bootstrap from a pre-trained model

You can fine-tune pre-trained model checkpoints by using the --checkpoint_dir flag. ... as they do not use automatic mixed precision training.

BERT for TensorFlow2 - NVIDIA NGC

This model is trained with mixed precision using Tensor Cores on Volta, Turing, ... and then using this pre-trained model for fine-tuning for...