question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] GPT-J + init_inference + replace_with_kernel_inject returns copy error with multiple GPUs

See original GitHub issue

Describe the bug

Using the replace_with_kernel_inject option in init_inference returns an error when using multiple GPUs (with a GPT-J model).

To Reproduce Steps to reproduce the behavior:

  1. Create an inference script using HF Transformers and GPT-J
  2. Run the deepspeed command with multiple GPUs
import os
local_rank = int(os.getenv('LOCAL_RANK', '0'))
world_size = int(os.getenv('WORLD_SIZE', '1'))

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import deepspeed
from transformers import pipeline as t_pipeline

model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-j-6B", torch_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")
generator = t_pipeline('text-generation', model=model, tokenizer=tokenizer, eos_token_id=50256,  device=local_rank)

generator.model = deepspeed.init_inference(generator.model,
                                            mp_size=world_size,
                                            dtype=torch.float16,
                                            replace_method= 'auto',
                                            replace_with_kernel_inject= True
                                        )

input_list = ["This is the input "]

res_ds = generator(input_list, do_sample=True, max_length = 1000, eos_token_id=50256, temperature=0.25, pad_token_id=50257)

Expected behavior No error.

ds_report output Unavailable, not currently in the compute node.

Screenshots image

System info (please complete the following information):

  • OS: Linux - Ubuntu
  • One machine with 8x A100 40gb PCIE
  • Python 3.8
  • Using the following docker image: pytorch/pytorch:1.9.1-cuda11.1-cudnn8-devel

Launcher context Deepspeed command line

Docker context Base image is: pytorch/pytorch:1.9.1-cuda11.1-cudnn8-devel

Additional context

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:12 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
lanking520commented, Jul 20, 2022

Hi @TiesdeKok I am also facing the garbage output issue. Not sure if it is related to the issue you were having previously: https://github.com/microsoft/DeepSpeed/issues/2113

1reaction
tomeripcommented, Feb 27, 2022

Hi @TiesdeKok, I think taking a look on this issue I opened might be relevant to your use case: https://github.com/microsoft/DeepSpeed/issues/1797 I think it at least explains why you got the exclamation marks outputs and also probably raise your attention regarding the outputs you’re getting in case you pad some of your inputs.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Splitting GPT-J(And Other NLP Models) Over Multiple GPUs
As language models get larger, it becomes harder and harder to run them on normal consumer hardware. One way to get around this...
Read more >
Multi-GPU Programming - NVIDIA
Managing multiple GPUs from a single CPU thread. • CUDA calls are issued to the current GPU. – Exception: peer-to-peer memcopies.
Read more >
GPU programming in CUDA: Using multiple GPUs
Kernel launches are asynchronous. ▷ do some cpu work is executed concurrently with the kernel. cudaMemcpy waits for the kernel to complete and...
Read more >
Run MATLAB Functions on Multiple GPUs - MathWorks
This example shows how to run MATLAB® code on multiple GPUs in parallel, first on your local machine, then scaling up to a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found