Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Bug occurs when using the huggingface seq2seq model using the inference engine.

See original GitHub issue

I’m trying to deploy some language models using DeepSpeed’s inference engine. Currently, I am trying to deploy a seq2seq langauge model, I have succeeded in parallelizing the model. However, when I try to generate, Then the following error occurred.

import time
import torch

from torch.distributed import get_rank
from deepspeed import InferenceEngine
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer


model_name = 'facebook/bart-large'
model = AutoModelForSeq2SeqLM.from_pretrained(model_name).half()
tokenizer = AutoTokenizer.from_pretrained(model_name)

model = InferenceEngine(
    model=model,
    mp_size=2,
    dtype=torch.half,
)

torch.cuda.empty_cache()

tokens = tokenizer.encode(
    "Hello",
    return_tensors="pt",
    truncation=True,
    padding=True,
).cuda()

output = model.generate(
    tokens,
    min_length=30,
    max_length=31,
)  # <--- problem

if get_rank() == 0:
    print(f"Output: {tokenizer.decode(output.tolist()[0])}")

deepspeed --num_gpus=2 inference.py

RuntimeError: Tensors must be non-overlapping and dense

Issue Analytics

State:
Created 2 years ago
Comments:5 (5 by maintainers)

Top GitHub Comments

1reaction

aphedgescommented, Jun 28, 2021

I am getting the same error when using the num_return_sequences parameter with GPT-Neo. I ran deepspeed --num_gpus 4 test.py with the following code in test.py:

import os

import deepspeed
import torch
from transformers import pipeline

local_rank = int(os.getenv("LOCAL_RANK", "0"))
world_size = int(os.getenv("WORLD_SIZE", "1"))
pipe = pipeline(
    "text-generation",
    model="EleutherAI/gpt-neo-2.7B",
    framework="pt",
    device=local_rank,
)
pipe.model = deepspeed.init_inference(pipe.model, mp_size=world_size, dtype=torch.float32)
output = pipe("I am very", do_sample=True, num_return_sequences=10)
if torch.distributed.get_rank() == 0:
    print(output)

I get RuntimeError: Tensors must be non-overlapping and dense errors when running deepspeed==0.4.0, but I can confirm that #1168 fixes the issue for me.

This code was run with Python 3.7.9, torch==1.8.1, and transformers==4.6.1.

0reactions

RezaYazdaniAminabadicommented, Jun 22, 2021

Hi @hyunwoongko

Thanks for investigating the issue for these new models 👍 I will test the branch and merge this soon 😃

Reza