Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] DeBERTa has bad performance when using ZERO Stage-3 with continuous warnings "A module has unknown inputs or outputs type"

See original GitHub issue

Describe the bug DeBERTa has bad performance when using ZERO Stage-3 . stdout has continuous warnings

[stage3.py:104:_apply_to_tensors_only] A module has unknown inputs or outputs type (<class 
'torch.nn.parameter.Parameter'>) and the tensors embedded in it cannot be detected. The ZeRO-3 hooks designed to trigger before
 or after backward pass of the module relies on knowing the input and output tensors and therefore may not get triggered proper
ly.

To Reproduce Steps to reproduce the behavior:

Official HF Accelerate run_glue_no_trainer.py script
Setting up DeepSpeed Zero-3 theough command accelerate config. The output config yaml:

compute_environment: LOCAL_MACHINE
deepspeed_config:
  gradient_accumulation_steps: 1
  offload_optimizer_device: none
  zero_stage: 3
distributed_type: DEEPSPEED
fsdp_config: {}
machine_rank: 0
main_process_ip: null
main_process_port: null
main_training_function: main
mixed_precision: fp16
num_machines: 1
num_processes: 2
use_cpu: false

bash script to run the finetuning of microsoft/deberta-v2-xlarge-mnli on MRPC dataset using ZERO Stage-3.

#!/bin/bash

time accelerate launch /home/sourab/deepspeed-test/src/text-classification/run_glue_no_trainer.py \
--task_name "mrpc" \
--max_length 128 \
--model_name_or_path "microsoft/deberta-v2-xlarge-mnli" \
--output_dir "/home/sourab/deepspeed-test/glue/mrpc_deepspeed_stage3_accelerate" \
--per_device_train_batch_size 16 \
--per_device_eval_batch_size 16 \
--gradient_accumulation_steps 1 \
--learning_rate 3.5e-6 \
--weight_decay 0.0 \
--max_grad_norm 1.0 \
--num_train_epochs 6 \
--num_warmup_steps 50 \
--with_tracking \

Relevant output snippets. The first one shows the weird behaviour with continuous warnings. The second shows the eval metrics being worse when compared to setup without using DeepSpeed.

Expected behavior A clear and concise description of what you expected to happen. No contiguous stream of warnings and no performance degradation when using DeepSpeed Stage-3 with DeBERTa.

ds_report output Please run ds_report to give us details about your setup.

--------------------------------------------------                                                                     [0/1948]
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja 
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
 [WARNING]  please install triton==1.0.0 if you want to use sparse attention
sparse_attn ............ [NO] ....... [NO]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
async_io ............... [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/sourab/dev/lib/python3.8/site-packages/torch']
torch version .................... 1.12.0.dev20220505+cu113
torch cuda version ............... 11.3
torch hip version ................ None
nvcc version ..................... 10.2
deepspeed install path ........... ['/home/sourab/dev/lib/python3.8/site-packages/deepspeed']
deepspeed info ................... 0.6.4, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.12, cuda 11.3

Screenshots If applicable, add screenshots to help explain your problem.

System info (please complete the following information):

OS: Ubuntu 20.04.3 LTS (Focal Fossa)
GPU count and types: 1 machine with x2 NVIDIA TITAN RTX each
Python version: Python 3.8.10

Launcher context Are you launching your experiment with the deepspeed launcher, MPI, or something else? Accelerate launcher which just triggers deepspeed launcher

Issue Analytics

State:
Created a year ago
Comments:10 (10 by maintainers)

Top GitHub Comments

1reaction

tjruwasecommented, May 24, 2022

@pacman100, thanks for sharing your update. I am glad that performance problem is resolved in the latest code. I have created this #1974 to suppress the warning noise. The PR probably needs tweaking such as whether to report this warning some fixed number of times. Right now, it is complete turned off except for debugging mode. Can you please test the PR branch?

0reactions

pacman100commented, Jul 30, 2022

Hello @tjruwase, Thank you for the fix 😄! Yes, the above PR is working as expected to suppress the warnings.

Top Results From Across the Web

DeepSpeed Integration - Hugging Face

DeepSpeed ZeRO-3 can be used for inference as well, since it allows huge models to be loaded on multiple GPUs, which won't be...

quadrature method dqm: Topics by Science.gov

A quadrature method that is especially suitable and that has been employed for such equations is one based on the trepezoidal rule that...

Why, How and Where of Delays in Software Security Patch ...

Consequently, it has resulted in most security attacks targeting known vulnerabilities for which a patch existed but delayed application. Despite.

16.pdf - National Academic Digital Library of Ethiopia

IAENG is a non-profit international association for the engineers and the computer scientists, which was found originally in 1968 and has been.

naacl-hlt 2022 - ACL Anthology

The industry track was first introduced to a major NLP conference at NAACL-HLT 2018 in New Orleans and has since become a standard...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

[BUG] DeBERTa has bad performance when using ZERO Stage-3 with continuous warnings "A module has unknown inputs or outputs type"

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

DeepSpeedCPUAdam triggers core dump at a specific shape (1024,) using FP16

[BUG] DeepSpeed non-deterministic inference with HF transformers when replace_with_kernel_inject=True