T5 fp16 crashes on the CPU (but works on CUDA)
See original GitHub issueEnvironment info
- transformers version: 4.5.1
- Platform: Linux-4.15.0-126-generic-x86_64-with-glibc2.10 (ubuntu 18.04)
- Python version: 3.8.0
- PyTorch version (GPU?): 1.7.1 (True)
- Tensorflow version (GPU?): not installed (NA)
- Using GPU in script?: Nvidia Quadro RTX 8000
- Using distributed or parallel set-up in script?: no
Who can help
Models:
Information
I am using T5EncoderModel.
The problem arises when using:
- the official example scripts: (give details below)
- my own modified scripts: (give details below)
The tasks I am working on is:
- an official GLUE/SQUaD task: (give the name)
- my own task or dataset: (give details below)
To reproduce
Steps to reproduce the behavior:
- Try to generate embeddings with half precision T5EncoderModel on CUDA, see that it works (script below)
- Try the same thing on the CPU, see that it crashes (
RuntimeError: "baddbmm__mkl" not implemented for 'Half'
)
import torch
from numpy import ndarray
from transformers import T5EncoderModel, T5Tokenizer
# This is protein sequence; Each character is one amino acid which is equivalent to one word
protein = "MKKLFVVLVVMPLIYGDNFPCSKLTNRTIGNHWNLIETFLLNYSSRLPPNSDVVLGDYFPTVQPWFNCIRNNSNDLYVTLENLKALYWDYAKETITWNHKQRLNVVVNGYPYSITVTTTRNFNSAEGAIICICKGSPPTTTTESSLTCNWGSECRLNHKFPICPSNSESNCGNMLYGLQWFADE"
def embed(
sequence: str,
model: T5EncoderModel,
tokenizer: T5Tokenizer,
) -> ndarray:
# Every amino acid is a "word"
sequence = " ".join(list(sequence))
ids = tokenizer.batch_encode_plus(
[sequence], add_special_tokens=True, padding="longest"
)
tokenized_sequences = torch.tensor(ids["input_ids"]).to(model.device)
attention_mask = torch.tensor(ids["attention_mask"]).to(model.device)
with torch.no_grad():
embeddings = model(input_ids=tokenized_sequences, attention_mask=attention_mask)
return embeddings[0].cpu().numpy()
def main():
model_name = "Rostlab/prot_t5_xl_uniref50"
tokenizer = T5Tokenizer.from_pretrained(model_name, do_lower_case=False)
model = T5EncoderModel.from_pretrained(model_name)
model = model.half()
# This passes
model = model.to(torch.device("cuda")).eval()
embed(protein, model, tokenizer)
# This fails
model = model.to(torch.device("cpu")).eval()
embed(protein, model, tokenizer)
if __name__ == "__main__":
main()
Traceback (most recent call last):
File "test-data/t5_cpu.py", line 44, in <module>
main()
File "test-data/t5_cpu.py", line 40, in main
embed(protein, model, tokenizer)
File "test-data/t5_cpu.py", line 23, in embed
embeddings = model(input_ids=tokenized_sequences, attention_mask=attention_mask)
File "/mnt/project/seqvec-search/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/mnt/project/seqvec-search/.venv/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 1728, in forward
encoder_outputs = self.encoder(
File "/mnt/project/seqvec-search/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/mnt/project/seqvec-search/.venv/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 948, in forward
layer_outputs = layer_module(
File "/mnt/project/seqvec-search/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/mnt/project/seqvec-search/.venv/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 631, in forward
self_attention_outputs = self.layer[0](
File "/mnt/project/seqvec-search/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/mnt/project/seqvec-search/.venv/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 538, in forward
attention_output = self.SelfAttention(
File "/mnt/project/seqvec-search/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/mnt/project/seqvec-search/.venv/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 477, in forward
scores = torch.matmul(
RuntimeError: "baddbmm__mkl" not implemented for 'Half'
Expected behavior
Computing embeddings also works on the CPU and gives approximately the same results as on the GPU
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
TensorRT 8.4.1 Release Notes - NVIDIA Documentation Center
TensorRT in FP16 mode does not perform cast operations correctly when only the output types are set, but not the layer precisions.
Read more >Retrieval-Augmented Generation (RAG) - ParlAI
This index is only ~3gb of RAM but comes at the price of performance degradation. This is the default option as it works...
Read more >How to avoid huggingface t5-based seq to seq suddenly ...
Why would I get cuda/blas related crashes when I do pass them? My current approach is to just "ignore" a loss of nan...
Read more >General Usage - Simple Transformers
Deep Learning (DL) models are typically run on CUDA-enabled GPUs as the performance is far, far superior compared to running on a CPU....
Read more >Whisper – open source speech recognition by OpenAI
But yes you can identify which content probably has errors and flag it as ... Set up a computer with voice recognition software...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Probably the best way to make things shift is to comment in the pytorch land and voice the need/importance. I don’t think there is anything that we could do to fix this in the
transformers
side at the moment. Please correct me if I’m wrong.This is still relevant, as https://github.com/pytorch/pytorch/issues/55374 is still unresolved