question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Successfully loaded torchscript model failed with "CUDA error: CUBLAS_STATUS_NOT_INITIALIZED" when called for inference

See original GitHub issue

Description I converted a pytorch model to torchscript using the following script: https://gist.github.com/keskarnitish/1061cbd101ab186e2d80c7877517e7ee#file-saved_pytorch_model-py.

I tested the model using

import torch
model = torch.jit.load('model.pt')
example_outputs = model(example_inputs['input_ids'])

and it worked as expected.

I then deployed tritonserver:20.03-py3 on GKE on a node with T4 GPU.

I ran nvidia-smi on the node and got:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   62C    P0    32W /  70W |   3163MiB / 15079MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

The triton server successfully loaded the model on the node. I checked the api status and it said that the model is ready.

But when I ran the perf_client, I got the following on the server logs:

I0525 05:24:42.733448 1 libtorch_backend.cc:538] Running bert with 1 request payloads
I0525 05:24:42.734669 1 pinned_memory_manager.cc:131] pinned memory allocation: size 256, addr 0x7f8a20000090
I0525 05:24:43.009041 1 libtorch_backend.cc:804] CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`
The above operation failed in interpreter.
Traceback (most recent call last):
Serialized   File "code/__torch__.py", line 9
    _0 = self.model
    input_ids = torch.to(data, dtype=4, layout=0, device=torch.device("cuda"), pin_memory=False, non_blocking=False, copy=False, memory_format=None)
    return ((_0).forward(input_ids, ),)
             ~~~~~~~~~~~ <--- HERE
Serialized   File "code/__torch__/transformers/modeling_bert.py", line 10, in forward
    input_ids: Tensor) -> Tensor:
    _0 = self.classifier
    _1 = (self.dropout).forward((self.bert).forward(input_ids, ), )
                                 ~~~~~~~~~~~~~~~~~~ <--- HERE
    return (_0).forward(_1, )
class BertModel(Module):
Serialized   File "code/__torch__/transformers/modeling_bert.py", line 35, in forward
    _12 = torch.to(extended_attention_mask, 6, False, False, None)
    attention_mask0 = torch.mul(torch.rsub(_12, 1., 1), CONSTANTS.c0)
    _13 = (_3).forward((_4).forward(input_ids, input, ), attention_mask0, )
           ~~~~~~~~~~~ <--- HERE
    return (_2).forward(_13, )
class BertEmbeddings(Module):
Serialized   File "code/__torch__/transformers/modeling_bert.py", line 73, in forward
    attention_mask: Tensor) -> Tensor:
    _26 = getattr(self.layer, "1")
    _27 = (getattr(self.layer, "0")).forward(argument_1, attention_mask, )
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    _28 = getattr(self.layer, "2")
    _29 = (_26).forward(_27, attention_mask, )
Serialized   File "code/__torch__/transformers/modeling_bert.py", line 107, in forward
    _49 = self.output
    _50 = self.intermediate
    _51 = (self.attention).forward(argument_1, attention_mask, )
           ~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    _52 = (_49).forward((_50).forward(_51, ), _51, )
    return _52
Serialized   File "code/__torch__/transformers/modeling_bert.py", line 119, in forward
    attention_mask: Tensor) -> Tensor:
    _53 = self.output
    _54 = (self.self).forward(argument_1, attention_mask, )
           ~~~~~~~~~~~~~~~~~~ <--- HERE
    return (_53).forward(_54, argument_1, )
class BertSelfAttention(Module):
Serialized   File "code/__torch__/transformers/modeling_bert.py", line 134, in forward
    _56 = self.value
    _57 = self.key
    _58 = (self.query).forward(argument_1, )
           ~~~~~~~~~~~~~~~~~~~ <--- HERE
    _59 = (_57).forward(argument_1, )
    _60 = (_56).forward(argument_1, )
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py(1612): linear
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/linear.py(87): forward
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py(534): _slow_forward
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py(548): __call__
/usr/local/lib/python3.6/dist-packages/transformers/modeling_bert.py(216): forward
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py(534): _slow_forward
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py(548): __call__
/usr/local/lib/python3.6/dist-packages/transformers/modeling_bert.py(314): forward
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py(534): _slow_forward
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py(548): __call__
/usr/local/lib/python3.6/dist-packages/transformers/modeling_bert.py(368): forward
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py(534): _slow_forward
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py(548): __call__
/usr/local/lib/python3.6/dist-packages/transformers/modeling_bert.py(407): forward
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py(534): _slow_forward
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py(548): __call__
/usr/local/lib/python3.6/dist-packages/transformers/modeling_bert.py(734): forward
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py(534): _slow_forward
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py(548): __call__
/usr/local/lib/python3.6/dist-packages/transformers/modeling_bert.py(1142): forward
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py(534): _slow_forward
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py(548): __call__
<ipython-input-2-afc347149dec>(9): forward
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py(534): _slow_forward
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py(548): __call__
/usr/local/lib/python3.6/dist-packages/torch/jit/__init__.py(1027): trace_module
/usr/local/lib/python3.6/dist-packages/torch/jit/__init__.py(875): trace
<ipython-input-2-afc347149dec>(13): <module>
/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py(2882): run_code
/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py(2822): run_ast_nodes
/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py(2718): run_cell
/usr/local/lib/python3.6/dist-packages/ipykernel/zmqshell.py(537): run_cell
/usr/local/lib/python3.6/dist-packages/ipykernel/ipkernel.py(208): do_execute
/usr/local/lib/python3.6/dist-packages/ipykernel/kernelbase.py(399): execute_request
/usr/local/lib/python3.6/dist-packages/ipykernel/kernelbase.py(233): dispatch_shell
/usr/local/lib/python3.6/dist-packages/ipykernel/kernelbase.py(283): dispatcher
/usr/local/lib/python3.6/dist-packages/tornado/stack_context.py(277): null_wrapper
/usr/local/lib/python3.6/dist-packages/zmq/eventloop/zmqstream.py(438): _run_callback
/usr/local/lib/python3.6/dist-packages/zmq/eventloop/zmqstream.py(486): _handle_recv
/usr/local/lib/python3.6/dist-packages/zmq/eventloop/zmqstream.py(456): _handle_events
/usr/local/lib/python3.6/dist-packages/tornado/stack_context.py(277): null_wrapper
/usr/local/lib/python3.6/dist-packages/tornado/ioloop.py(888): start
/usr/local/lib/python3.6/dist-packages/ipykernel/kernelapp.py(499): start
/usr/local/lib/python3.6/dist-packages/traitlets/config/application.py(664): launch_instance
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py(16): <module>
/usr/lib/python3.6/runpy.py(85): _run_code
/usr/lib/python3.6/runpy.py(193): _run_module_as_main
Serialized   File "code/__torch__/torch/nn/modules/linear.py", line 9, in forward
    argument_1: Tensor) -> Tensor:
    _0 = self.bias
    output = torch.matmul(argument_1, torch.t(self.weight))
             ~~~~~~~~~~~~ <--- HERE
    return torch.add_(output, _0, alpha=1)

I0525 05:24:43.009080 1 pinned_memory_manager.cc:158] pinned memory deallocation: addr 0x7f8a20000090

Triton Information What version of Triton are you using? 20.03

Are you using the Triton container or did you build it yourself? Triton container

To Reproduce Steps to reproduce the behavior.

See description.

Expected behavior The server should not return any error.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:17 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
deadeyegoodwincommented, May 27, 2020

Thanks for the detailed bug report, we will take a look.

0reactions
CoderHamcommented, Jun 22, 2020

@katie-cathy-hunt please verify the host system has its CUDA environment set up correctly. I am closing this ticket for now since we unable to reproduce the error with the appropriate environment. Please re-open if you still see this failure. @ethem-kinginthenorth please test the same with the upcoming 20.06 release, the V2 APIs were made more robust in this release.

Read more comments on GitHub >

github_iconTop Results From Across the Web

TorchScript — PyTorch 1.13 documentation
TorchScript is a way to create serializable and optimizable models from PyTorch code. Any TorchScript program can be saved from a Python process...
Read more >
runtimeerror: cuda error: cublas_status_not_initialized when ...
Hi, I tried to add some other embeddings in your BertEmbedding source code and then load the pretrained weights 'bert-base-chinese'.
Read more >
PyTorch JIT and TorchScript - Towards Data Science
Prepares PyTorch models for inference on CPU/GPU. Model/Data should be on the same device for training/inference to happen. cuda() transfers the ...
Read more >
TorchScript: Tracing vs. Scripting - Yuxin's Blog
It parses the Python source code of the model, and compiles the code into a graph. torch.jit.script is an export API that uses...
Read more >
Error when converting PyTorch model to TorchScript
I just figured out that models loaded from torchvision.models are in train mode by default. AlexNet and SqueezeNet both have Dropout layers, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found