[BUG] DeepSpeed non-deterministic inference with HF transformers when replace_with_kernel_inject=True
See original GitHub issueDescribe the bug
DeepSpeed non-deterministic inference with HF transformers when replace_with_kernel_inject=True
.
To Reproduce
import os
import deepspeed
import torch
from transformers import pipeline, set_seed
local_rank = int(os.getenv("LOCAL_RANK", "0"))
world_size = int(os.getenv("WORLD_SIZE", "1"))
generator = pipeline("text-generation", model="gpt2", device=local_rank)
prompt = "In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English."
hf_greedy = generator(prompt, do_sample=False, max_length=128)
set_seed(42)
hf_sample = generator(prompt, do_sample=True, max_length=128)
generator.model = deepspeed.init_inference(
generator.model,
mp_size=world_size,
dtype=torch.float,
replace_method="auto",
replace_with_kernel_inject=True, # Four assertions pass when it is False
)
set_seed(42)
ds_sample_1 = generator(prompt, do_sample=True, max_length=128)
set_seed(42)
ds_sample_2 = generator(prompt, do_sample=True, max_length=128)
assert hf_sample == ds_sample_1
assert ds_sample_1 == ds_sample_2
ds_greedy_1 = generator(prompt, do_sample=False, max_length=128)
ds_greedy_2 = generator(prompt, do_sample=False, max_length=128)
assert hf_greedy == ds_greedy_1
assert ds_greedy_1 == ds_greedy_2
Expected behavior The four assertions should all pass. I found that sometime some of the assertions would pass.
ds_report output
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [93m[NO][0m ....... [92m[OKAY][0m
cpu_adagrad ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m please install triton==1.0.0 if you want to use sparse attention
sparse_attn ............ [93m[NO][0m ....... [93m[NO][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-dev package with apt
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
utils .................. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/path/miniconda3/envs/deepspeed/lib/python3.9/site-packages/torch']
torch version .................... 1.11.0
torch cuda version ............... 11.3
torch hip version ................ None
nvcc version ..................... 11.1
deepspeed install path ........... ['/path/miniconda3/envs/deepspeed/lib/python3.9/site-packages/deepspeed']
deepspeed info ................... 0.6.4, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.11, cuda 11.3
System info (please complete the following information):
- OS: Ubuntu 20.04.4
- GPU count and types: one machine with 1 NVIDIA TITAN RTX
- Interconnects (if applicable): n/a
- Python version: Python 3.9.7
- Any other relevant info about your setup
Launcher context
deepspeed
launcher and jupyter notebook
Docker context n/a
Issue Analytics
- State:
- Created a year ago
- Comments:8 (5 by maintainers)
Top Results From Across the Web
[BUG] DeepSpeed non-deterministic inference with HF GPT2 ...
Describe the bug #1950 describes a bug by which running inference twice on the same input leads to different outputs.
Read more >DeepSpeed Integration - Hugging Face
We're on a journey to advance and democratize artificial intelligence through open source and open science.
Read more >Accelerate BERT inference with DeepSpeed-Inference on GPUs
In this session, you will learn how to optimize Hugging Face Transformers models for GPU inference using DeepSpeed-Inference.
Read more >Optimization approaches for Transformers [Part 2] - Kaggle
Table of contents. Introduction; Pre-tokenization / pre-encoding; Turn Dropout off; TorchScript; DeepSpeed; Layers Fusing; Conclusion; Feedback ...
Read more >Big Model Training, Inference & More in Lightning ...
Lightning Transformers 0.2 is out, and includes support for Hugging Face Hub Checkpoints, Big Transformers Inference, and DeepSpeed ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Done 😃 0.6.5 is out now on pypi
@shijie-wu, can you give it a try now with this PR linked above? I think we have this fixed now. Tested on A100, A6000, V100, and P40.