Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Data type error while fine-tuning Deberta v3 Large using code provided

See original GitHub issue

Environment info

transformers version: 4.13.0.dev0
Platform: Ubuntu 18.04
Python version: Python 3.6.9
PyTorch version (GPU?): 1.11.0.dev20211110+cu111
Tensorflow version (GPU?): 2.6.2
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: No

Who can help

@LysandreJik

Information

Model I am using (Bert, XLNet …): microsoft/deberta-v3-large

The problem arises when using:

the official example scripts: (give details below): https://huggingface.co/microsoft/deberta-v3-large#fine-tuning-with-hf-transformers
[] my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: (give the name) mnli
my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

go to transformers/examples/pytorch/text-classification/
Run - python3 run_glue.py --model_name_o r_path microsoft/deberta-v3-large --task_name mnli --do_train --do_eval --evaluation_strategy steps --max_seq_length 25 6 --warmup_steps 50 --learning_rate 6e-5 --num_train_epochs 3 --output_dir outputv3 --overwrite_output_dir --logging_ steps 10000 --logging_dir outputv3/ or run the script given in the model card - https://huggingface.co/microsoft/deberta-v3-large#fine-tuning-with-hf-transformers

Expected behavior

Training of microsoft/deberta-v3-large on the mnli dataset.

The error I am getting- Traceback (most recent call last): File “run_glue.py”, line 568, in <module> main() File “run_glue.py”, line 486, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File “/home/nikhil/.local/lib/python3.6/site-packages/transformers/trainer.py”, line 1316, in train tr_loss_step = self.training_step(model, inputs) File “/home/nikhil/.local/lib/python3.6/site-packages/transformers/trainer.py”, line 1867, in training_step loss.backward() File “/home/nikhil/.local/lib/python3.6/site-packages/torch/_tensor.py”, line 352, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File “/home/nikhil/.local/lib/python3.6/site-packages/torch/autograd/init.py”, line 175, in backward allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass File “/home/nikhil/.local/lib/python3.6/site-packages/torch/autograd/function.py”, line 199, in apply return user_fn(self, *args) File “/home/nikhil/.local/lib/python3.6/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py”, line 114, in backward inputGrad = _softmax_backward_data(grad_output, output, self.dim, output) TypeError: _softmax_backward_data(): argument ‘input_dtype’ (position 4) must be torch.dtype, not Tensor 0%|

I am also getting the same error when trying to train Deberta-v2

Issue Analytics

State:
Created 2 years ago
Comments:11 (3 by maintainers)

Top GitHub Comments

13reactions

amathews-amdcommented, Feb 12, 2022

Fourth argument of _softmax_backward_data is now torch.dtype.

https://github.com/pytorch/pytorch/blob/a34d2849cd3d39c2ce912402bfd90aea75162d1f/tools/autograd/derivatives.yaml#L1852

Changing inputGrad = _softmax_backward_data(grad_output, output, self.dim, output) to inputGrad = _softmax_backward_data(grad_output, output, self.dim, output.dtype) seems to work.

1reaction

LysandreJikcommented, Nov 19, 2021

Hello @NIKHILDUGAR, thanks for opening an issue! I’m trying to get the same error as you but I’m failing at doing so: the training runs correctly.

I wonder if it isn’t because you’re on the bleeding edge with a PyTorch dev version? We recommend using a PyTorch stable release as those are heavily tested in our CI. Do you get the same error when using PyTorch 1.10?