Bug occurs when using the huggingface seq2seq model using the inference engine.
See original GitHub issueI’m trying to deploy some language models using DeepSpeed’s inference engine. Currently, I am trying to deploy a seq2seq langauge model, I have succeeded in parallelizing the model. However, when I try to generate, Then the following error occurred.
import time
import torch
from torch.distributed import get_rank
from deepspeed import InferenceEngine
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model_name = 'facebook/bart-large'
model = AutoModelForSeq2SeqLM.from_pretrained(model_name).half()
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = InferenceEngine(
model=model,
mp_size=2,
dtype=torch.half,
)
torch.cuda.empty_cache()
tokens = tokenizer.encode(
"Hello",
return_tensors="pt",
truncation=True,
padding=True,
).cuda()
output = model.generate(
tokens,
min_length=30,
max_length=31,
) # <--- problem
if get_rank() == 0:
print(f"Output: {tokenizer.decode(output.tolist()[0])}")
deepspeed --num_gpus=2 inference.py
RuntimeError: Tensors must be non-overlapping and dense
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
Troubleshoot - Hugging Face
Troubleshoot. Sometimes errors occur, but we are here to help! This guide covers some of the most common issues we've seen and how...
Read more >Pipelines - Hugging Face
The pipelines are a great and easy way to use models for inference. These pipelines are objects that abstract most of the complex...
Read more >What to do when you get an error - Hugging Face Course
In this section we'll look at some common errors that can occur when you're trying to generate predictions from your freshly tuned Transformer...
Read more >Model trains with Seq2SeqTrainer but gets stuck using Trainer
Hi, I've been trying to finetune the BART large pre-trained on MNLI with the Financial Phrasebank dataset to build a model for news ......
Read more >RAG - Hugging Face
RAG is a seq2seq model which encapsulates two core components: a question encoder and a generator. During a forward pass, we encode the...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I am getting the same error when using the
num_return_sequences
parameter with GPT-Neo. I randeepspeed --num_gpus 4 test.py
with the following code intest.py
:I get
RuntimeError: Tensors must be non-overlapping and dense
errors when runningdeepspeed==0.4.0
, but I can confirm that #1168 fixes the issue for me.This code was run with Python 3.7.9,
torch==1.8.1
, andtransformers==4.6.1
.Hi @hyunwoongko
Thanks for investigating the issue for these new models 👍 I will test the branch and merge this soon 😃
Reza