Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Out of memeory error for batch size more than 1 for T5 models.

See original GitHub issue

hey, first of all, thanks for creating this amazing library!

I’m following your T5 implementation with trt, https://github.com/ELS-RD/transformer-deploy/blob/b52850dce004212225edcaa7b80fccc311398038/t5.py#L222

And, I’m trying to convert the onnx version of the T5 model to tensorrt engine using your build_engine method, https://github.com/ELS-RD/transformer-deploy/blob/1f2d2c1d8d0239fca7679f8c550a954ea1445cfa/src/transformer_deploy/backends/trt_utils.py#L64

It works fine for a batch size of 1, but for batch size > 1. it’s taking longer to build (almost an hour just for the t5-small encoder), and even after that it’s not building the model successfully and getting the following error :

[03/18/2022-12:51:55] [TRT] [E] 2: [virtualMemoryBuffer.cpp::resizePhysical::161] Error Code 2: OutOfMemory (no further information)
[03/18/2022-12:51:55] [TRT] [E] 2: [virtualMemoryBuffer.cpp::resizePhysical::161] Error Code 2: OutOfMemory (no further information)
[03/18/2022-12:51:55] [TRT] [E] 10: [optimizer.cpp::computeCosts::2011] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[encoder.embed_tokens.weight...Mul_406]}.)
[03/18/2022-12:51:55] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
Traceback (most recent call last):
  File "export_onnx_to_trt.py", line 100, in <module>
    build_t5_engine(onnx_encoder_path, trt_encoder_path, [input_id_shape])
  File "export_onnx_to_trt.py", line 86, in build_t5_engine
    engine: ICudaEngine = build_engine(
  File "/app/utils.py", line 209, in build_engine
    engine: ICudaEngine = runtime.deserialize_cuda_engine(trt_engine)
TypeError: deserialize_cuda_engine(): incompatible function arguments. The following argument types are supported:
    1. (self: tensorrt.tensorrt.Runtime, serialized_engine: buffer) -> tensorrt.tensorrt.ICudaEngine

Invoked with: <tensorrt.tensorrt.Runtime object at 0x7f380bbf8930>, None

some system info if that helps;

trt+cuda - 8.2.1-1+cuda11.4
os - ubuntu 20.04.3
gpu - T4 with 15GB memory

the errors say I need more GPU memory, I was wondering how much GPU memory did you use for a batch size of 5? or maybe I’m missing something?

I would really appreciate any help, thank you!

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:15 (9 by maintainers)

Top GitHub Comments

1reaction

pommedeterresauteecommented, Jun 2, 2022

T5 work requires a good support of the If Onnx node, which has been recently added to Onnx Runtime (only master branch). Triton support will be added when Onnx Runtime 1.12 (somewhere in June) and Triton with Onnx Runtime 1.12 engine will be released.

0reactions

pommedeterresauteecommented, Aug 1, 2022

FYI, Triton 22.07 has been released. It fixes a bug where ORT tensors where always put in host memory (plus it’s built with ORT 1.12.0 which have also it’s own memory placement bug).

Updated code of this repo (there are some subtleties to manage, not just an update of the docker image):

https://github.com/ELS-RD/transformer-deploy/pull/116