question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error when using GPT2 `model.forward` method with DeepSpeed inference

See original GitHub issue

Hi,

I am trying to use DeepSpeed with GPT-2/Neo for inference, but encountering an error when using model.forward method directly. It seems that the error occurs if I don’t provide an attention_mask.

import deepspeed
from transformers import GPT2Tokenizer, GPT2LMHeadModel
import torch

tokenizer = GPT2Tokenizer.from_pretrained("gpt2-medium")
model = GPT2LMHeadModel.from_pretrained("gpt2-medium")
model = model.cuda()

model = deepspeed.init_inference(model,
                                 mp_size=1,
                                 dtype=torch.float,
                                 replace_method='auto')

ids = tokenizer.encode("A valley full of unicorns was discovered",
                       add_special_tokens=False, return_tensors="pt").cuda()
# attn = torch.ones((1, 8), dtype=torch.long).cuda()
output = model(
    input_ids=ids,
    # attention_mask=attn, # <-- need to provide this
    return_dict=True,
    use_cache=True
)
print(output)
Traceback (most recent call last):
  File "deepspeed_attn.py", line 25, in <module>
    use_cache=True
  File "torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "deepspeed/inference/engine.py", line 222, in forward
    return self.module(*inputs, **kwargs)
  File "torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "transformers/models/gpt2/modeling_gpt2.py", line 954, in forward
    return_dict=return_dict,
  File "torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "transformers/models/gpt2/modeling_gpt2.py", line 797, in forward
    output_attentions=output_attentions,
  File "torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "deepspeed/ops/transformer/inference/transformer_inference.py", line 611, in forward
    self.norm_b)
  File "torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "deepspeed/ops/transformer/inference/transformer_inference.py", line 393, in forward
    self.qkv_merging)
  File "deepspeed/ops/transformer/inference/transformer_inference.py", line 141, in forward
    while len(input_mask.shape) < 4:
AttributeError: 'NoneType' object has no attribute 'shape'

When I use model.generate, as in the example scripts, I am not getting any errors as generate takes care of creating attention_mask (but in general I think model.forward should work even without providing attention_mask):

https://github.com/huggingface/transformers/blob/1ed2ebf60d87ef12bd063c7c58e484e19189c754/src/transformers/generation_utils.py#L395

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:10 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
Express50commented, Jul 7, 2021

Thanks! Just tested it on my end as well and it works both with and without attention_mask.

1reaction
RezaYazdaniAminabadicommented, Jul 6, 2021

I see. I will fix it soon. Thanks for trying it out 👍

Read more comments on GitHub >

github_iconTop Results From Across the Web

[BUG] CUDA error with INT 8 inference · Issue #1788 - GitHub
I am trying to get started with implementing INT 8 inference on Deepspeed. But I am running into RuntimeError: CUDA error: an illegal...
Read more >
DeepSpeed Integration - Hugging Face
We're on a journey to advance and democratize artificial intelligence through open source and open science.
Read more >
Zero Redundancy Optimizer - DeepSpeed
In this tutorial, we will apply the ZeRO optimizer to the Megatron-LM GPT-2 model. ZeRO is a powerful set of memory optimization techniques...
Read more >
DeepSpeed - Release 0.7.7 Microsoft
user wants to use torch distributed calls before calling deepspeed.initialize(), ... Set to true to inject inference kernels for models such as, Bert,...
Read more >
Accelerated Training for Transformer-based Models on GPUs
focus on model inference or optimization for only BERT-like ... 25% of the time when training Transformer-Big using PyTorch ... Otherwise, the error....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found