Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] DeepSpeed non-deterministic inference with HF GPT2 when `replace_with_kernel_inject=True`

See original GitHub issue

Describe the bug https://github.com/microsoft/DeepSpeed/issues/1950 describes a bug by which running inference twice on the same input leads to different outputs. It was supposedly fixed in version 0.6.5, but I am encountering a similar bug (for Huggingface’s GPT2, on an NVidia A10G) in every deepspeed version after including 0.6.3 when running long sequences. My current fix is to use version 0.6.1.

Note: When running too short a sequence this bug does not appear. When running too long a sequence, I am rather seeing another open bug (https://github.com/microsoft/DeepSpeed/issues/2062) which prohibits inference.

To Reproduce

Install packages

!pip uninstall -y torch deepspeed transformers
!pip install --upgrade pip
!pip install --upgrade torch==1.12.1 --extra-index-url https://download.pytorch.org/whl/cu116
!pip install --upgrade deepspeed==0.7.0 transformers==4.21.1

Run code

import os
import deepspeed
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

os.environ["CUDA_LAUNCH_BLOCKING"] = "1"

model = AutoModelForCausalLM.from_pretrained("gpt2").to("cuda")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

model = deepspeed.init_inference(model, dtype=torch.half, replace_method='auto', replace_with_kernel_inject=True)

long_sequence = "asdfjk **[][] 890 889288 =-0=- 888***&*&#*$&*(#$ &*#$ &*( *(&))  lf  ds890342888977889888***&*&#*$&*(#$ &*#$ &*( *(&))890234908 fdS 809d890342888977889888***&*&#*$&*(#$ &*#$ &*( *(&))fs 8903428889&*(#$ &*#$ &*( *(&)))"
complex_input = tokenizer(long_sequence, return_tensors="pt").to("cuda")

for _ in range(3):
    outputs = model(**complex_input)
    token_id = torch.argmax(outputs.logits.squeeze()[-1]).item()
    print(tokenizer.decode(token_id), outputs.logits.mean().item())  # we should always see the same output, but we don't

Observe that the output of the last print statement is different each time, although the input was always the same. Last time I ran it, I got e.g.

 Season -125.25
sp -170.25
 A -82.25

Expected behavior I expected to see the same output each time, i.e.

 Season -125.25
 Season -125.25
 Season -125.25

ds_report output

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
sparse_attn ............ [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-devel package with yum
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/torch']
torch version .................... 1.12.1+cu116
torch cuda version ............... 11.6
torch hip version ................ None
nvcc version ..................... 11.1
deepspeed install path ........... ['/home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/deepspeed']
deepspeed info ................... 0.7.0, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.12, cuda 11.6

System info (please complete the following information):

OS: Amazon Linux 2
GPU count and types: one NVidia A10G (AWS g5.xlarge)
Python version: 3.8.12

Launcher context inside a Python notebook

Issue Analytics

State:
Created a year ago
Reactions:1
Comments:5 (4 by maintainers)

Top GitHub Comments

1reaction

RezaYazdaniAminabadicommented, Aug 20, 2022

Okay, I verify that by changing the MAX_OUT_TOKES to a large enough #tokens, the problem goes away. We will merge the PR soon to resolve this issue. cc: @cmikeh2

1reaction

RezaYazdaniAminabadicommented, Aug 19, 2022

Hi @trianxy ,

I think I know where this issue is coming from. It is due to reducing the max-tokens to 128 here. We have a PR to fix this issue. We will merge this soon to resolve this issue. Thanks, Reza

Top Results From Across the Web

Megatron-LM GPT2 - DeepSpeed

In this tutorial we will be adding DeepSpeed to Megatron-LM GPT2 model, which is a large, powerful transformer. Megatron-LM supports model-parallel and ...