question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

FunnelTransformerForSequenceClassification crashes when fine tuning with mixed precision flag

See original GitHub issue

Environment info

  • transformers version: 3.2.0
  • Platform: Linux-4.15.0-45-generic-x86_64-with-debian-buster-sid
  • Python version: Python 3.7.7
  • PyTorch version (GPU?): 1.5.1 (True)
  • Tensorflow version (GPU?): 2.2.0 (True)
  • Using GPU in script?: True
  • Using distributed or parallel set-up in script?: No

Who can help

@sgugger As I saw you were the one who worked on the PR implementing Funnel Transformer

Information

Model I am using: Funnel Transformer

The problem arises when using:

  • [ o ] the official example scripts: (give details below)
  • [ x ] my own modified scripts: Only when enabling the mixed precision flag. I am now training the model without it, but I had to lower the batch size, thus increasing the training time. I have to mention that I just fined tuned a roberta-base model using fp16=True and fp16_opt_level='O1', thus nvidia APEX is properly installed/configured.

The tasks I am working on is:

  • [ o ] an official GLUE/SQUaD task: (give the name)
  • [ x ] my own task or dataset: Basically I am trying to fine tune FunnelForSequenceClassification using my own custom data-set:
# some code to load data from CSV
# ...
# wrapper around PyTorch for holding datasets
class IMDbDataset(torch.utils.data.Dataset):
    # same code as in the Huggingface docs
    # ...

# load tokenizer
tokenizer = FunnelTokenizer.from_pretrained('funnel-transformer/large-base')

# tokenize texts
train_encodings = tokenizer(train_texts, truncation=True, padding=True)
val_encodings = tokenizer(val_texts, truncation=True, padding=True)
test_encodings = tokenizer(test_texts, truncation=True, padding=True)

train_dataset = IMDbDataset(train_encodings, train_labels)
val_dataset = IMDbDataset(val_encodings, val_labels)
test_dataset = IMDbDataset(test_encodings, test_labels)

# training args used
training_args = TrainingArguments(
    output_dir='./results',          # output directory
    num_train_epochs=3,              # total number of training epochs
    per_device_train_batch_size=16,  # batch size per device during training
    per_device_eval_batch_size=64,  # batch size for evaluation
    #learning_rate=35e-6,
    weight_decay=0.01,               # strength of weight decay
    warmup_steps=500,                # number of warmup steps for learning rate scheduler
    logging_dir='./logs',            # directory for storing logs
    logging_steps=10,
    fp16=True,
    fp16_opt_level='O1'       # here I tried both O1 and O2 with the same result
)

model = FunnelForSequenceClassification.from_pretrained('funnel-transformer/large-base',
                                                        return_dict=True,
                                                        num_labels=max(train_labels)+1)

trainer = Trainer(
    model=model,                         # the instantiated 🤗 Transformers model to be trained
    args=training_args,                  # training arguments, defined above
    train_dataset=train_dataset,         # training dataset
    eval_dataset=val_dataset             # evaluation dataset
)

trainer.train()

trainer.save_model('funnel')

To reproduce

Steps to reproduce the behavior:

  1. Run script
  2. Wait for script to reach the training part

Stacktrace:

  File "funnel.py", line 89, in <module>
    trainer.train()
  File "/root/anaconda/envs/ai/lib/python3.7/site-packages/transformers/trainer.py", line 741, in train
    tr_loss += self.training_step(model, inputs)
  File "/root/anaconda/envs/ai/lib/python3.7/site-packages/transformers/trainer.py", line 1046, in training_step
    loss = self.compute_loss(model, inputs)
  File "/root/anaconda/envs/ai/lib/python3.7/site-packages/transformers/trainer.py", line 1070, in compute_loss
    outputs = model(**inputs)
  File "/root/anaconda/envs/ai/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/anaconda/envs/ai/lib/python3.7/site-packages/transformers/modeling_funnel.py", line 1263, in forward
    return_dict=return_dict,
  File "/root/anaconda/envs/ai/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/anaconda/envs/ai/lib/python3.7/site-packages/transformers/modeling_funnel.py", line 950, in forward
    return_dict=return_dict,
  File "/root/anaconda/envs/ai/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/anaconda/envs/ai/lib/python3.7/site-packages/transformers/modeling_funnel.py", line 655, in forward
    layer_output = layer(query, key, value, attention_inputs, output_attentions=output_attentions)
  File "/root/anaconda/envs/ai/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/anaconda/envs/ai/lib/python3.7/site-packages/transformers/modeling_funnel.py", line 602, in forward
    attn = self.attention(query, key, value, attention_inputs, output_attentions=output_attentions)
  File "/root/anaconda/envs/ai/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/anaconda/envs/ai/lib/python3.7/site-packages/transformers/modeling_funnel.py", line 548, in forward
    content_score = torch.einsum("bind,bjnd->bnij", q_head + r_w_bias, k_head)
  File "/root/anaconda/envs/ai/lib/python3.7/site-packages/torch/functional.py", line 292, in einsum
    return _VF.einsum(equation, operands)
RuntimeError: Expected object of scalar type Float but got scalar type Half for argument #2 'mat2' in call to _th_bmm

This seems like a very similar issue.

Expected behavior

We should be able to train the model with mixed precision to use VRAM more efficiently.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
iAlex97commented, Sep 25, 2020

Good thing to know I don’t have to build APEX next time 😉

I just pulled the latest commit from your branch and can confirm loss is no longer nan.

Great job and thanks for assistance!

1reaction
sguggercommented, Sep 25, 2020

I have found the reason (and why I wasn’t managing to fine-tune a model on some GLUE task yesterday). Turns out I was matching exactly the implementation of the authors but in transformers, we put 1 in attentions masks for tokens not masked… stupid me.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Training and fine-tuning — transformers 3.5.0 documentation
In this quickstart, we will show how to fine-tune (or train from scratch) a ... models with features like mixed precision and easy...
Read more >
Fine-Tuning can Distort Pretrained Features and ...
When transferring a pretrained model to a downstream task, two popular methods are full fine-tuning (updating all the model parameters) and linear probing ......
Read more >
Out of Memory Crash Triggered by lr_find and fine tune for ...
Am I the only one finding hard to use the lr_find method because it keeps on crashing? Some context: fastai version: 2.5.2 and...
Read more >
Bootstrap from a pre-trained model
You can fine-tune pre-trained model checkpoints by using the --checkpoint_dir flag. ... as they do not use automatic mixed precision training.
Read more >
BERT for TensorFlow2 - NVIDIA NGC
This model is trained with mixed precision using Tensor Cores on Volta, Turing, ... and then using this pre-trained model for fine-tuning for...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found