[BUG] DeepSpeed non-deterministic inference with HF GPT2 when `replace_with_kernel_inject=True`
See original GitHub issueDescribe the bug
https://github.com/microsoft/DeepSpeed/issues/1950 describes a bug by which running inference twice on the same input leads to different outputs. It was supposedly fixed in version 0.6.5
, but I am encountering a similar bug (for Huggingface’s GPT2, on an NVidia A10G) in every deepspeed version after including 0.6.3
when running long sequences. My current fix is to use version 0.6.1
.
Note: When running too short a sequence this bug does not appear. When running too long a sequence, I am rather seeing another open bug (https://github.com/microsoft/DeepSpeed/issues/2062) which prohibits inference.
Perhaps related bug: https://github.com/microsoft/DeepSpeed/issues/2229
To Reproduce
- Install packages
!pip uninstall -y torch deepspeed transformers
!pip install --upgrade pip
!pip install --upgrade torch==1.12.1 --extra-index-url https://download.pytorch.org/whl/cu116
!pip install --upgrade deepspeed==0.7.0 transformers==4.21.1
- Run code
import os
import deepspeed
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
os.environ["CUDA_LAUNCH_BLOCKING"] = "1"
model = AutoModelForCausalLM.from_pretrained("gpt2").to("cuda")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = deepspeed.init_inference(model, dtype=torch.half, replace_method='auto', replace_with_kernel_inject=True)
long_sequence = "asdfjk **[][] 890 889288 =-0=- 888***&*&#*$&*(#$ &*#$ &*( *(&)) lf ds890342888977889888***&*&#*$&*(#$ &*#$ &*( *(&))890234908 fdS 809d890342888977889888***&*&#*$&*(#$ &*#$ &*( *(&))fs 8903428889&*(#$ &*#$ &*( *(&)))"
complex_input = tokenizer(long_sequence, return_tensors="pt").to("cuda")
for _ in range(3):
outputs = model(**complex_input)
token_id = torch.argmax(outputs.logits.squeeze()[-1]).item()
print(tokenizer.decode(token_id), outputs.logits.mean().item()) # we should always see the same output, but we don't
- Observe that the output of the last print statement is different each time, although the input was always the same. Last time I ran it, I got e.g.
Season -125.25
sp -170.25
A -82.25
Expected behavior I expected to see the same output each time, i.e.
Season -125.25
Season -125.25
Season -125.25
ds_report output
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
sparse_attn ............ [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-devel package with yum
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/torch']
torch version .................... 1.12.1+cu116
torch cuda version ............... 11.6
torch hip version ................ None
nvcc version ..................... 11.1
deepspeed install path ........... ['/home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/deepspeed']
deepspeed info ................... 0.7.0, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.12, cuda 11.6
System info (please complete the following information):
- OS: Amazon Linux 2
- GPU count and types: one NVidia A10G (AWS g5.xlarge)
- Python version: 3.8.12
Launcher context inside a Python notebook
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:5 (4 by maintainers)
Top GitHub Comments
Okay, I verify that by changing the
MAX_OUT_TOKES
to a large enough #tokens, the problem goes away. We will merge the PR soon to resolve this issue. cc: @cmikeh2Hi @trianxy ,
I think I know where this issue is coming from. It is due to reducing the max-tokens to 128 here. We have a PR to fix this issue. We will merge this soon to resolve this issue. Thanks, Reza