Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

T5 fp16 crashes on the CPU (but works on CUDA)

See original GitHub issue

Environment info

transformers version: 4.5.1
Platform: Linux-4.15.0-126-generic-x86_64-with-glibc2.10 (ubuntu 18.04)
Python version: 3.8.0
PyTorch version (GPU?): 1.7.1 (True)
Tensorflow version (GPU?): not installed (NA)
Using GPU in script?: Nvidia Quadro RTX 8000
Using distributed or parallel set-up in script?: no

Who can help

Models:

t5: @patrickvonplaten, @patil-suraj

Information

I am using T5EncoderModel.

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: (give the name)
my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

Try to generate embeddings with half precision T5EncoderModel on CUDA, see that it works (script below)
Try the same thing on the CPU, see that it crashes (RuntimeError: "baddbmm__mkl" not implemented for 'Half')

import torch
from numpy import ndarray
from transformers import T5EncoderModel, T5Tokenizer

# This is protein sequence; Each character is one amino acid which is equivalent to one word
protein = "MKKLFVVLVVMPLIYGDNFPCSKLTNRTIGNHWNLIETFLLNYSSRLPPNSDVVLGDYFPTVQPWFNCIRNNSNDLYVTLENLKALYWDYAKETITWNHKQRLNVVVNGYPYSITVTTTRNFNSAEGAIICICKGSPPTTTTESSLTCNWGSECRLNHKFPICPSNSESNCGNMLYGLQWFADE"


def embed(
    sequence: str,
    model: T5EncoderModel,
    tokenizer: T5Tokenizer,
) -> ndarray:
    # Every amino acid is a "word"
    sequence = " ".join(list(sequence))

    ids = tokenizer.batch_encode_plus(
        [sequence], add_special_tokens=True, padding="longest"
    )
    tokenized_sequences = torch.tensor(ids["input_ids"]).to(model.device)
    attention_mask = torch.tensor(ids["attention_mask"]).to(model.device)
    with torch.no_grad():
        embeddings = model(input_ids=tokenized_sequences, attention_mask=attention_mask)

    return embeddings[0].cpu().numpy()


def main():
    model_name = "Rostlab/prot_t5_xl_uniref50"
    tokenizer = T5Tokenizer.from_pretrained(model_name, do_lower_case=False)
    model = T5EncoderModel.from_pretrained(model_name)
    model = model.half()

    # This passes
    model = model.to(torch.device("cuda")).eval()
    embed(protein, model, tokenizer)

    # This fails
    model = model.to(torch.device("cpu")).eval()
    embed(protein, model, tokenizer)


if __name__ == "__main__":
    main()

Traceback (most recent call last):
  File "test-data/t5_cpu.py", line 44, in <module>
    main()
  File "test-data/t5_cpu.py", line 40, in main
    embed(protein, model, tokenizer)
  File "test-data/t5_cpu.py", line 23, in embed
    embeddings = model(input_ids=tokenized_sequences, attention_mask=attention_mask)
  File "/mnt/project/seqvec-search/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/project/seqvec-search/.venv/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 1728, in forward
    encoder_outputs = self.encoder(
  File "/mnt/project/seqvec-search/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/project/seqvec-search/.venv/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 948, in forward
    layer_outputs = layer_module(
  File "/mnt/project/seqvec-search/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/project/seqvec-search/.venv/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 631, in forward
    self_attention_outputs = self.layer[0](
  File "/mnt/project/seqvec-search/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/project/seqvec-search/.venv/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 538, in forward
    attention_output = self.SelfAttention(
  File "/mnt/project/seqvec-search/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/project/seqvec-search/.venv/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 477, in forward
    scores = torch.matmul(
RuntimeError: "baddbmm__mkl" not implemented for 'Half'

Expected behavior

Computing embeddings also works on the CPU and gives approximately the same results as on the GPU

Issue Analytics

State:
Created 2 years ago
Comments:5 (4 by maintainers)

Top GitHub Comments

2reactions

stas00commented, Jun 3, 2021

Probably the best way to make things shift is to comment in the pytorch land and voice the need/importance. I don’t think there is anything that we could do to fix this in the transformers side at the moment. Please correct me if I’m wrong.

1reaction

konstincommented, Jun 3, 2021

This is still relevant, as https://github.com/pytorch/pytorch/issues/55374 is still unresolved