question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Inference kernels don't handle Huggingface attention_mask correctly

See original GitHub issue

Describe the bug When I use DeepSpeed’s inference kernels with Huggingface transformers and pass an in an attention_mask that masks out some tokens, the mask affects the output in strange ways. In my repro code this results in very different logits; when sampling this results in garbage samples.

To Reproduce Run the following code:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import deepspeed

DEEPSPEED = True

device = torch.device("cuda")

model = AutoModelForCausalLM.from_pretrained("gpt2")
model.to(device)

tokenizer = AutoTokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token

if DEEPSPEED:
    ds_engine = deepspeed.init_inference(
        model, mp_size=1, dtype=torch.half, checkpoint=None, replace_method="auto"
    )
    model = ds_engine.module

text = ["This is a test sentence."]
no_padding = tokenizer(text)

no_padding_logits = model(torch.tensor(no_padding["input_ids"], device=device)).logits

with_padding = tokenizer(text, padding="max_length", max_length=32)
with_padding_logits = model(
    torch.tensor(with_padding["input_ids"], device=device),
    attention_mask=torch.tensor(with_padding["attention_mask"], device=device),
).logits

difference = torch.max(
    torch.abs(no_padding_logits - with_padding_logits[:, : no_padding_logits.shape[1]])
).item()

print(f"Max difference: {difference:.2g}")
assert difference <= 2e-4

I get:

Max difference: 0.25
Traceback (most recent call last):
  File "/home/ubuntu/unity/adversarial/test_deepspeed_inference.py", line 38, in <module>
    assert difference <= 2e-4
AssertionError

Expected behavior The assertion should pass. Adding masked-out padding tokens after the tokens in question should not dramatically shift the output. Indeed, when I run with DEEPSPEED = False, the max difference is only 0.00011.

ds_report output

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
sparse_attn ............ [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/ubuntu/.asdf/installs/python/3.9.6/lib/python3.9/site-packages/torch']
torch version .................... 1.9.0+cu111
torch cuda version ............... 11.1
nvcc version ..................... 11.1
deepspeed install path ........... ['/home/ubuntu/.asdf/installs/python/3.9.6/lib/python3.9/site-packages/deepspeed']
deepspeed info ................... 0.5.4, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.9, cuda 11.1

System info (please complete the following information):

  • OS: Ubuntu 18.04
  • GPU: one V100-16GB
  • Interconnects: n/a
  • Python version: 3.9.6

Launcher context Just running directly: one process on one GPU.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
RezaYazdaniAminabadicommented, Oct 29, 2021

Thanks @daniel-ziegler for testing this and happy to see the issue is solved 😃

1reaction
daniel-zieglercommented, Oct 28, 2021

Oh, and with .half() for the baseline, as you pointed out:

no deepspeed
Max difference: 0.11
Mean difference: 0.032

So DeepSpeed looks pretty good after this fix.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshoot - Hugging Face
Troubleshoot. Sometimes errors occur, but we are here to help! This guide covers some of the most common issues we've seen and how...
Read more >
Glossary - Hugging Face
The attention mask is a binary tensor indicating the position of the padded indices so that the model does not attend to them....
Read more >
Optimization story: Bloom inference - Hugging Face
This article gives you the behind-the-scenes of how we made an efficient inference server that powers bloom. inference server that powers ...
Read more >
What to do when you get an error - Hugging Face Course
Oh no, something seems to have gone wrong! If you're new to programming, these kind of errors can seem a bit cryptic at...
Read more >
Handling big models - Hugging Face
This way, you model can run for inference even if it doesn't fit on one of the GPUs or the CPU RAM! This...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found