question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] DeBERTa has bad performance when using ZERO Stage-3 with continuous warnings "A module has unknown inputs or outputs type"

See original GitHub issue

Describe the bug DeBERTa has bad performance when using ZERO Stage-3 . stdout has continuous warnings

[stage3.py:104:_apply_to_tensors_only] A module has unknown inputs or outputs type (<class 
'torch.nn.parameter.Parameter'>) and the tensors embedded in it cannot be detected. The ZeRO-3 hooks designed to trigger before
 or after backward pass of the module relies on knowing the input and output tensors and therefore may not get triggered proper
ly.

To Reproduce Steps to reproduce the behavior:

  1. Official HF Accelerate run_glue_no_trainer.py script
  2. Setting up DeepSpeed Zero-3 theough command accelerate config. The output config yaml:
compute_environment: LOCAL_MACHINE
deepspeed_config:
  gradient_accumulation_steps: 1
  offload_optimizer_device: none
  zero_stage: 3
distributed_type: DEEPSPEED
fsdp_config: {}
machine_rank: 0
main_process_ip: null
main_process_port: null
main_training_function: main
mixed_precision: fp16
num_machines: 1
num_processes: 2
use_cpu: false
  1. bash script to run the finetuning of microsoft/deberta-v2-xlarge-mnli on MRPC dataset using ZERO Stage-3.
#!/bin/bash

time accelerate launch /home/sourab/deepspeed-test/src/text-classification/run_glue_no_trainer.py \
--task_name "mrpc" \
--max_length 128 \
--model_name_or_path "microsoft/deberta-v2-xlarge-mnli" \
--output_dir "/home/sourab/deepspeed-test/glue/mrpc_deepspeed_stage3_accelerate" \
--per_device_train_batch_size 16 \
--per_device_eval_batch_size 16 \
--gradient_accumulation_steps 1 \
--learning_rate 3.5e-6 \
--weight_decay 0.0 \
--max_grad_norm 1.0 \
--num_train_epochs 6 \
--num_warmup_steps 50 \
--with_tracking \
  1. Relevant output snippets. The first one shows the weird behaviour with continuous warnings. The second shows the eval metrics being worse when compared to setup without using DeepSpeed. Screenshot 2022-05-19 at 12 49 39 AM Screenshot 2022-05-19 at 12 55 51 AM

Expected behavior A clear and concise description of what you expected to happen. No contiguous stream of warnings and no performance degradation when using DeepSpeed Stage-3 with DeBERTa.

ds_report output Please run ds_report to give us details about your setup.

--------------------------------------------------                                                                     [0/1948]
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja 
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
 [WARNING]  please install triton==1.0.0 if you want to use sparse attention
sparse_attn ............ [NO] ....... [NO]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
async_io ............... [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/sourab/dev/lib/python3.8/site-packages/torch']
torch version .................... 1.12.0.dev20220505+cu113
torch cuda version ............... 11.3
torch hip version ................ None
nvcc version ..................... 10.2
deepspeed install path ........... ['/home/sourab/dev/lib/python3.8/site-packages/deepspeed']
deepspeed info ................... 0.6.4, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.12, cuda 11.3

Screenshots If applicable, add screenshots to help explain your problem.

System info (please complete the following information):

  • OS: Ubuntu 20.04.3 LTS (Focal Fossa)
  • GPU count and types: 1 machine with x2 NVIDIA TITAN RTX each
  • Python version: Python 3.8.10

Launcher context Are you launching your experiment with the deepspeed launcher, MPI, or something else? Accelerate launcher which just triggers deepspeed launcher

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:10 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
tjruwasecommented, May 24, 2022

@pacman100, thanks for sharing your update. I am glad that performance problem is resolved in the latest code. I have created this #1974 to suppress the warning noise. The PR probably needs tweaking such as whether to report this warning some fixed number of times. Right now, it is complete turned off except for debugging mode. Can you please test the PR branch?

0reactions
pacman100commented, Jul 30, 2022

Hello @tjruwase, Thank you for the fix 😄! Yes, the above PR is working as expected to suppress the warnings.

Read more comments on GitHub >

github_iconTop Results From Across the Web

DeepSpeed Integration - Hugging Face
DeepSpeed ZeRO-3 can be used for inference as well, since it allows huge models to be loaded on multiple GPUs, which won't be...
Read more >
quadrature method dqm: Topics by Science.gov
A quadrature method that is especially suitable and that has been employed for such equations is one based on the trepezoidal rule that...
Read more >
Why, How and Where of Delays in Software Security Patch ...
Consequently, it has resulted in most security attacks targeting known vulnerabilities for which a patch existed but delayed application. Despite.
Read more >
16.pdf - National Academic Digital Library of Ethiopia
IAENG is a non-profit international association for the engineers and the computer scientists, which was found originally in 1968 and has been.
Read more >
naacl-hlt 2022 - ACL Anthology
The industry track was first introduced to a major NLP conference at NAACL-HLT 2018 in New Orleans and has since become a standard...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found