Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] DeepSpeed non-deterministic inference with HF transformers when replace_with_kernel_inject=True

See original GitHub issue

Describe the bug DeepSpeed non-deterministic inference with HF transformers when replace_with_kernel_inject=True.

To Reproduce

import os
import deepspeed
import torch
from transformers import pipeline, set_seed

local_rank = int(os.getenv("LOCAL_RANK", "0"))
world_size = int(os.getenv("WORLD_SIZE", "1"))
generator = pipeline("text-generation", model="gpt2", device=local_rank)
prompt = "In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English."

hf_greedy = generator(prompt, do_sample=False, max_length=128)

set_seed(42)
hf_sample = generator(prompt, do_sample=True, max_length=128)

generator.model = deepspeed.init_inference(
    generator.model,
    mp_size=world_size,
    dtype=torch.float,
    replace_method="auto",
    replace_with_kernel_inject=True, # Four assertions pass when it is False
)

set_seed(42)
ds_sample_1 = generator(prompt, do_sample=True, max_length=128)
set_seed(42)
ds_sample_2 = generator(prompt, do_sample=True, max_length=128)
assert hf_sample == ds_sample_1
assert ds_sample_1 == ds_sample_2

ds_greedy_1 = generator(prompt, do_sample=False, max_length=128)
ds_greedy_2 = generator(prompt, do_sample=False, max_length=128)
assert hf_greedy == ds_greedy_1
assert ds_greedy_1 == ds_greedy_2

Expected behavior The four assertions should all pass. I found that sometime some of the assertions would pass.

ds_report output

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [93m[NO][0m ....... [92m[OKAY][0m
cpu_adagrad ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m please install triton==1.0.0 if you want to use sparse attention
sparse_attn ............ [93m[NO][0m ....... [93m[NO][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-dev package with apt
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
utils .................. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/path/miniconda3/envs/deepspeed/lib/python3.9/site-packages/torch']
torch version .................... 1.11.0
torch cuda version ............... 11.3
torch hip version ................ None
nvcc version ..................... 11.1
deepspeed install path ........... ['/path/miniconda3/envs/deepspeed/lib/python3.9/site-packages/deepspeed']
deepspeed info ................... 0.6.4, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.11, cuda 11.3

System info (please complete the following information):