[BUG] [master] Garbage GPT-Neo-X output when using multi-gpu inference
See original GitHub issueDescribe the bug Similar to #2233 and #2133 I’m seeing garbage output when using multi-gpu fp16 inference for gpt-neo-x. Running the script below, replacing Gpt-Neo-X with GPT-Neo-2.7B works fine.
Output from 2 3090s with Deepspeed inference:
"Deepspeed is BytePtrFromStringgranwasysym BytePtrFromString BytePtrFromString BytePtrFromString BytePtrFromString BytePtrFromString _ BytePtrFromStringHypergranTal 2011 BytePtrFromString BytePtrFromString **j BytePtrFromString BytePtrFromString BytePtrFromStringgran¶Enggrantwgran _ BytePtrFromStringgran ausgranENTRY¶`){#Delta¶sysEveramssymbitgran`Ever last`grangran ** deliberate ENTRY stag Eng` BytePtrFromStringwasysym _ BytePtrFromStringwasysymBOX Eng...](granModelupgreek BytePtrFromStringamssymb BytePtrFromStringwasysym BytePtrFromStringSegment BytePtrFromString BytePtrFromString _ BytePtrFromString BytePtrFromStringupgreekEverEng_( **gran mistENTRY BytePtrFromString BytePtrFromString _amssymbwasysym..." last BytePtrFromStringwasysym BytePtrFromString BytePtrFromStringgrangran ever"
Note that ‘BytePtrFromString’ has shown up as the beginning of the generated tokens for every prompt I’ve used.
Output from 2 3090s with huggingface accelerate (way slower than deepspeed):
"Deepspeed is \nan on-line digital media company created in January 2002. Over the past 10 \nyears, Deepspeed has provided a comprehensive digital entertainment network to\n businesses throughout the US"
To Reproduce Steps to reproduce the behavior:
- Install deepspeed master, huggingface transformers, torch, and accelerate.
- Run the following script with deepspeed to get bad output:
import os
from pathlib import Path
import deepspeed
import torch
import transformers
CKPT_PRETRAINED = Path("/ckpt/pretrained")
model = GPTNeoXForCausalLM.from_pretrained(CKPT_PRETRAINED / "EleutherAI/gpt-neox-20b", local_files_only=True, torch_dtype=torch.float16) #.half() (both .half() and torch_dtype=torch.float16 have this issue.
tokenizer = GPTNeoXTokenizerFast.from_pretrained(CKPT_PRETRAINED / "EleutherAI/gpt-neox-20b", local_files_only=True)
local_rank = int(os.getenv("LOCAL_RANK", "0"))
world_size = int(os.getenv("WORLD_SIZE", "1"))
local_device = f"cuda:{local_rank}"
print(f"local device: {local_device}")
ds_engine = deepspeed.init_inference(
model, mp_size=world_size, dtype=torch.float16, checkpoint=None,
replace_method='auto'
replace_with_kernel_inject=True,
)
model = ds_engine.module
prompt = "Deepspeed is "
m_inp = tokenizer(prompt, return_tensors="pt")
attn_mask = m_inp.get("attention_mask", None).to(device=local_device)
ids = m_inp.input_ids.to(device=local_device)
with torch.no_grad():
gen_tokens = model.generate(
ids, attention_mask=attn_mask,
do_sample=True, temperature=0.9, max_new_tokens=100,
use_cache=False, # fails with use_cache=True as well
)
gen_text = tokenizer.batch_decode(gen_tokens)[0]
print(f"generated tokens: {gen_text}")
- Run the following script with accelerate to get good output:
import os
import argparse
from pathlib import Path
import torch
from accelerate import init_empty_weights, infer_auto_device_map, load_checkpoint_and_dispatch
from transformers import AutoConfig, AutoModelForCausalLM
from transformers.models.gpt_neox.tokenization_gpt_neox_fast import GPTNeoXTokenizerFast
CKPT_PRETRAINED = Path("/ckpt/pretrained")
weights_path = "/ckpt/pretrained/EleutherAI/gpt-neox-20b"
model_name = 'EleutherAI/gpt-neox-20b'
config = AutoConfig.from_pretrained("/ckpt/pretrained/EleutherAI/gpt-neox-20b/config.json")
config.use_cache = False
with init_empty_weights():
model = AutoModelForCausalLM.from_config(config)
tokenizer = GPTNeoXTokenizerFast.from_pretrained(CKPT_PRETRAINED / "EleutherAI/gpt-neox-20b", local_files_only=True)
device_map = infer_auto_device_map(
model, no_split_module_classes=["GPTNeoXLayer"],dtype=torch.bfloat16, #note: succeeds with float16 as well.
max_memory = {0: "21GiB", 1: "21GiB", 'cpu': "20GiB"},
)
device_map['gpt_neox.embed_in'] = 'cpu'
print(f"device_map: {device_map}")
load_checkpoint_and_dispatch(
model,
weights_path,
device_map=device_map,
offload_folder=None,
offload_state_dict=False,
dtype="bfloat16"
)
print(model)
model = model.eval()
prompt = "Deepspeed is "
m_inp = tokenizer(prompt, return_tensors="pt")
attn_mask = m_inp.get("attention_mask", None).to(device='cuda:0')
with torch.no_grad():
gen_tokens = model.generate(
m_inp["input_ids"].to(0), attention_mask = attn_mask,
do_sample=True, max_new_tokens=100, temperature=0.9
)
gen_text = tokenizer.decode(output[0].tolist())
print(f"generated tokens: {gen_text}")
Expected behavior I would expect output that makes sense, like when using accelerate.
ds_report output
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
[WARNING] please install triton==1.0.0 if you want to use sparse attention
sparse_attn ............ [NO] ....... [NO]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/root/.local/share/pdm/venvs/workspace-6rDWGpm2-docker/lib/python3.10/site-packages/torch']
torch version .................... 1.12.1+cu113
torch cuda version ............... 11.3
torch hip version ................ None
nvcc version ..................... 11.3
deepspeed install path ........... ['/root/.local/share/pdm/venvs/workspace-6rDWGpm2-docker/lib/python3.10/site-packages/deepspeed']
deepspeed info ................... 0.7.3+53182531, 53182531, master
deepspeed wheel compiled w. ...... torch 0.0, cuda 0.0
System info (please complete the following information):
- OS: Ubuntu 20.04
- GPU count and types: 3x3090s (using 2x3090 for the above scripts)
- Interconnects (if applicable): N/A
- Python version: 3.10
- Any other relevant info about your setup: Running in docker
Launcher context
launching with deepspeed: deepspeed --num_gpus 2 script.py
Docker context
### Start from NVIDIA deep learning base image
FROM nvidia/cuda:11.3.1-cudnn8-devel-ubuntu20.04
ENV TZ=America/New_York
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
### New NVIDIA cuda package keys ###
RUN apt-key del "7fa2af80" \
&& export this_distro="$(cat /etc/os-release | grep '^ID=' | awk -F'=' '{print $2}')" \
&& export this_version="$(cat /etc/os-release | grep '^VERSION_ID=' | awk -F'=' '{print $2}' | sed 's/[^0-9]*//g')" \
&& apt-key adv --fetch-keys "https://developer.download.nvidia.com/compute/cuda/repos/${this_distro}${this_version}/x86_64/3bf863cc.pub" \
&& apt-key adv --fetch-keys "https://developer.download.nvidia.com/compute/machine-learning/repos/${this_distro}${this_version}/x86_64/7fa2af80.pub"
### Install general packages from apt-get ###
RUN apt-get update && apt-get upgrade -y
RUN apt-get install -y build-essential
RUN apt-get install -y tzdata
RUN apt-get install -y software-properties-common curl vim tmux git wget
### Install redis
RUN curl -fsSL https://packages.redis.io/gpg | gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
RUN echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | tee /etc/apt/sources.list.d/redis.list
RUN apt-get update && apt-get install -y redis
# Make redis modules directory
RUN mkdir /etc/redis/modules
## Build & install RedisJSON module
# Required dependencies to build RedisJSON
RUN apt-get install -y llvm cmake libclang1 libclang-dev cargo
# Clone RedisJSON
RUN mkdir /builds; cd /builds; git clone https://github.com/RedisJSON/RedisJSON.git;
# Build RedisJSON
RUN cd /builds/RedisJSON; cargo build --release;
# Move RedisJSON .so to redis modules directory
RUN mv /builds/RedisJSON/target/release/librejson.so /etc/redis/modules
# delete build directory
RUN rm -rf /builds
### python + pip3 install ###
RUN add-apt-repository -y ppa:deadsnakes/ppa
RUN apt-get install -y python3.10
RUN apt-get install -y python3.10-distutils
RUN apt-get install -y python3.10-dev
# v set python3.10 as default python3
RUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.10 1
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10
RUN pip3 install --upgrade pip
#############################
### BLAS + LAPACK + fortran compiler install ###
RUN apt-get install -y libblas-dev liblapack-dev gfortran
################################################
### python-poetry setup ###
#ENV POETRY_HOME="/opt/poetry"
#ENV POETRY_VERSION=1.1.13
#
#RUN curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python3.10 -
#ENV PATH="POETRY_HOME/bin:$PATH"
### python-pdm setup ###
RUN curl -sSL https://raw.githubusercontent.com/pdm-project/pdm/main/install-pdm.py | python3.10 -
ENV PATH="/root/.local/bin:${PATH}"
WORKDIR /workspace
COPY ./pyproject.toml ./
COPY ./pdm.lock ./
RUN pdm config venv.in_project false
RUN pdm venv create --name docker 3.10
RUN pdm install -v --no-isolation
############################
RUN ls /workspace
Additional context
When in docker, run eval $(pdm venv activate docker)
to activate the venv, then run the deepspeed command
Issue Analytics
- State:
- Created a year ago
- Comments:5 (2 by maintainers)
Top GitHub Comments
@ryanai3 - Can you check https://github.com/microsoft/DeepSpeed/pull/2401 ?
@mrwyattii Any updates on this or ways I could help? I tested the changes from #2310 , and still have the issue (same output starting with
BytePtrFromStr
)