Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] GPT-J + init_inference + replace_with_kernel_inject returns copy error with multiple GPUs

See original GitHub issue

Describe the bug

Using the replace_with_kernel_inject option in init_inference returns an error when using multiple GPUs (with a GPT-J model).

To Reproduce Steps to reproduce the behavior:

Create an inference script using HF Transformers and GPT-J
Run the deepspeed command with multiple GPUs

import os
local_rank = int(os.getenv('LOCAL_RANK', '0'))
world_size = int(os.getenv('WORLD_SIZE', '1'))

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import deepspeed
from transformers import pipeline as t_pipeline

model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-j-6B", torch_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")
generator = t_pipeline('text-generation', model=model, tokenizer=tokenizer, eos_token_id=50256,  device=local_rank)

generator.model = deepspeed.init_inference(generator.model,
                                            mp_size=world_size,
                                            dtype=torch.float16,
                                            replace_method= 'auto',
                                            replace_with_kernel_inject= True
                                        )

input_list = ["This is the input "]

res_ds = generator(input_list, do_sample=True, max_length = 1000, eos_token_id=50256, temperature=0.25, pad_token_id=50257)

Expected behavior No error.

ds_report output Unavailable, not currently in the compute node.

Screenshots

System info (please complete the following information):

OS: Linux - Ubuntu
One machine with 8x A100 40gb PCIE
Python 3.8
Using the following docker image: pytorch/pytorch:1.9.1-cuda11.1-cudnn8-devel

Launcher context Deepspeed command line

Docker context Base image is: pytorch/pytorch:1.9.1-cuda11.1-cudnn8-devel

Additional context

The problem does not exist when replace_with_kernel_inject is set to False
Things work fine with replace_with_kernel_inject = True and running the script directly with a single GPU.
The error appears to come from here: https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/module_inject/replace_module.py#L74

Issue Analytics

State:
Created 2 years ago
Comments:12 (5 by maintainers)

Top GitHub Comments

1reaction

lanking520commented, Jul 20, 2022

Hi @TiesdeKok I am also facing the garbage output issue. Not sure if it is related to the issue you were having previously: https://github.com/microsoft/DeepSpeed/issues/2113

1reaction

tomeripcommented, Feb 27, 2022

Hi @TiesdeKok, I think taking a look on this issue I opened might be relevant to your use case: https://github.com/microsoft/DeepSpeed/issues/1797 I think it at least explains why you got the exclamation marks outputs and also probably raise your attention regarding the outputs you’re getting in case you pad some of your inputs.