question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Out of memeory error for batch size more than 1 for T5 models.

See original GitHub issue

hey, first of all, thanks for creating this amazing library!

I’m following your T5 implementation with trt, https://github.com/ELS-RD/transformer-deploy/blob/b52850dce004212225edcaa7b80fccc311398038/t5.py#L222

And, I’m trying to convert the onnx version of the T5 model to tensorrt engine using your build_engine method, https://github.com/ELS-RD/transformer-deploy/blob/1f2d2c1d8d0239fca7679f8c550a954ea1445cfa/src/transformer_deploy/backends/trt_utils.py#L64

It works fine for a batch size of 1, but for batch size > 1. it’s taking longer to build (almost an hour just for the t5-small encoder), and even after that it’s not building the model successfully and getting the following error :

[03/18/2022-12:51:55] [TRT] [E] 2: [virtualMemoryBuffer.cpp::resizePhysical::161] Error Code 2: OutOfMemory (no further information)
[03/18/2022-12:51:55] [TRT] [E] 2: [virtualMemoryBuffer.cpp::resizePhysical::161] Error Code 2: OutOfMemory (no further information)
[03/18/2022-12:51:55] [TRT] [E] 10: [optimizer.cpp::computeCosts::2011] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[encoder.embed_tokens.weight...Mul_406]}.)
[03/18/2022-12:51:55] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
Traceback (most recent call last):
  File "export_onnx_to_trt.py", line 100, in <module>
    build_t5_engine(onnx_encoder_path, trt_encoder_path, [input_id_shape])
  File "export_onnx_to_trt.py", line 86, in build_t5_engine
    engine: ICudaEngine = build_engine(
  File "/app/utils.py", line 209, in build_engine
    engine: ICudaEngine = runtime.deserialize_cuda_engine(trt_engine)
TypeError: deserialize_cuda_engine(): incompatible function arguments. The following argument types are supported:
    1. (self: tensorrt.tensorrt.Runtime, serialized_engine: buffer) -> tensorrt.tensorrt.ICudaEngine

Invoked with: <tensorrt.tensorrt.Runtime object at 0x7f380bbf8930>, None

some system info if that helps;

  • trt+cuda - 8.2.1-1+cuda11.4
  • os - ubuntu 20.04.3
  • gpu - T4 with 15GB memory

the errors say I need more GPU memory, I was wondering how much GPU memory did you use for a batch size of 5? or maybe I’m missing something?

I would really appreciate any help, thank you!

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:1
  • Comments:15 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
pommedeterresauteecommented, Jun 2, 2022

T5 work requires a good support of the If Onnx node, which has been recently added to Onnx Runtime (only master branch). Triton support will be added when Onnx Runtime 1.12 (somewhere in June) and Triton with Onnx Runtime 1.12 engine will be released.

0reactions
pommedeterresauteecommented, Aug 1, 2022

FYI, Triton 22.07 has been released. It fixes a bug where ORT tensors where always put in host memory (plus it’s built with ORT 1.12.0 which have also it’s own memory placement bug).

Updated code of this repo (there are some subtleties to manage, not just an update of the docker image):

https://github.com/ELS-RD/transformer-deploy/pull/116

it’s in review, can’t guarantee it works for everything

Let us know if it helps regarding your issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

T5-base goes out of memory on 4 GPUs with as small batch ...
Model I am using T5-base with batch size of 8 and on 4 GPUs, I am always getting out of memory even with...
Read more >
Cuda out of memory error - Intermediate - Hugging Face Forums
I encounter the below error when I finetune my dataset on mbart ... make sure you have a clean GPU memory; then cut...
Read more >
CUDA out of memory error, cannot reduce batch size
It is because of mini-batch of data does not fit onto GPU memory. Just decrease the batch size. When I set batch size...
Read more >
Solved: Detect Objects Using Deep Learning Error: Python R...
This means that all the dedicated memory on your graphics card is being used up while running the Detect Objects tool. The Batch...
Read more >
Training Tips for the Transformer Model
The BASE model allows for using a higher batch size than the BIG model. The cells where the BIG model resulted in out-of-memory...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found