question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Data type error while fine-tuning Deberta v3 Large using code provided

See original GitHub issue

Environment info

  • transformers version: 4.13.0.dev0
  • Platform: Ubuntu 18.04
  • Python version: Python 3.6.9
  • PyTorch version (GPU?): 1.11.0.dev20211110+cu111
  • Tensorflow version (GPU?): 2.6.2
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: No

Who can help

@LysandreJik

Information

Model I am using (Bert, XLNet …): microsoft/deberta-v3-large

The problem arises when using:

The tasks I am working on is:

  • an official GLUE/SQUaD task: (give the name) mnli
  • my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

  1. go to transformers/examples/pytorch/text-classification/
  2. Run - python3 run_glue.py --model_name_o r_path microsoft/deberta-v3-large --task_name mnli --do_train --do_eval --evaluation_strategy steps --max_seq_length 25 6 --warmup_steps 50 --learning_rate 6e-5 --num_train_epochs 3 --output_dir outputv3 --overwrite_output_dir --logging_ steps 10000 --logging_dir outputv3/ or run the script given in the model card - https://huggingface.co/microsoft/deberta-v3-large#fine-tuning-with-hf-transformers

Expected behavior

Training of microsoft/deberta-v3-large on the mnli dataset.

The error I am getting- Traceback (most recent call last): File “run_glue.py”, line 568, in <module> main() File “run_glue.py”, line 486, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File “/home/nikhil/.local/lib/python3.6/site-packages/transformers/trainer.py”, line 1316, in train tr_loss_step = self.training_step(model, inputs) File “/home/nikhil/.local/lib/python3.6/site-packages/transformers/trainer.py”, line 1867, in training_step loss.backward() File “/home/nikhil/.local/lib/python3.6/site-packages/torch/_tensor.py”, line 352, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File “/home/nikhil/.local/lib/python3.6/site-packages/torch/autograd/init.py”, line 175, in backward allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass File “/home/nikhil/.local/lib/python3.6/site-packages/torch/autograd/function.py”, line 199, in apply return user_fn(self, *args) File “/home/nikhil/.local/lib/python3.6/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py”, line 114, in backward inputGrad = _softmax_backward_data(grad_output, output, self.dim, output) TypeError: _softmax_backward_data(): argument ‘input_dtype’ (position 4) must be torch.dtype, not Tensor 0%|

I am also getting the same error when trying to train Deberta-v2

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:11 (3 by maintainers)

github_iconTop GitHub Comments

13reactions
amathews-amdcommented, Feb 12, 2022

Fourth argument of _softmax_backward_data is now torch.dtype.

https://github.com/pytorch/pytorch/blob/a34d2849cd3d39c2ce912402bfd90aea75162d1f/tools/autograd/derivatives.yaml#L1852

Changing inputGrad = _softmax_backward_data(grad_output, output, self.dim, output) to inputGrad = _softmax_backward_data(grad_output, output, self.dim, output.dtype) seems to work.

1reaction
LysandreJikcommented, Nov 19, 2021

Hello @NIKHILDUGAR, thanks for opening an issue! I’m trying to get the same error as you but I’m failing at doing so: the training runs correctly.

I wonder if it isn’t because you’re on the bleeding edge with a PyTorch dev version? We recommend using a PyTorch stable release as those are heavily tested in our CI. Do you get the same error when using PyTorch 1.10?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Data type error while trying to fine tune Deberta v3 Large
Hi @NDugar,. You get the error below because the dataset you use for fine-tuning does not have a validation split. As you can...
Read more >
Deberta-v3-large model fine tuning for Kaggle Competition ...
Also Checkout my 2nd Channel ( on Trading, Crypto & Investments ) - https://www.youtube.com/channel/UChMwVQBFtaOga5Mh0uE1Icg I am a Banker ...
Read more >
Unable to Finetune Deberta - Stack Overflow
I am trying to finetune deberta for irony detection task, colab's notebook ... When I try to use 'microsoft/deberta-v3-base' checkpoint with ...
Read more >
How to Calculate Number of Model Parameters for PyTorch ...
We live in the age of readily accessible large models. Anyone can create a Kaggle Kernel with a pretrained Deberta v3 model and...
Read more >
Fine-Tuning Scheduler - PyTorch Lightning - Read the Docs
Once the finetuning-scheduler package is installed, the FinetuningScheduler callback is available for use with PyTorch Lightning.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found