Out of memeory error for batch size more than 1 for T5 models.
See original GitHub issuehey, first of all, thanks for creating this amazing library!
I’m following your T5 implementation with trt, https://github.com/ELS-RD/transformer-deploy/blob/b52850dce004212225edcaa7b80fccc311398038/t5.py#L222
And, I’m trying to convert the onnx version of the T5 model to tensorrt engine using your build_engine
method,
https://github.com/ELS-RD/transformer-deploy/blob/1f2d2c1d8d0239fca7679f8c550a954ea1445cfa/src/transformer_deploy/backends/trt_utils.py#L64
It works fine for a batch size of 1, but for batch size > 1
. it’s taking longer to build (almost an hour just for the t5-small encoder), and even after that it’s not building the model successfully and getting the following error :
[03/18/2022-12:51:55] [TRT] [E] 2: [virtualMemoryBuffer.cpp::resizePhysical::161] Error Code 2: OutOfMemory (no further information)
[03/18/2022-12:51:55] [TRT] [E] 2: [virtualMemoryBuffer.cpp::resizePhysical::161] Error Code 2: OutOfMemory (no further information)
[03/18/2022-12:51:55] [TRT] [E] 10: [optimizer.cpp::computeCosts::2011] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[encoder.embed_tokens.weight...Mul_406]}.)
[03/18/2022-12:51:55] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
Traceback (most recent call last):
File "export_onnx_to_trt.py", line 100, in <module>
build_t5_engine(onnx_encoder_path, trt_encoder_path, [input_id_shape])
File "export_onnx_to_trt.py", line 86, in build_t5_engine
engine: ICudaEngine = build_engine(
File "/app/utils.py", line 209, in build_engine
engine: ICudaEngine = runtime.deserialize_cuda_engine(trt_engine)
TypeError: deserialize_cuda_engine(): incompatible function arguments. The following argument types are supported:
1. (self: tensorrt.tensorrt.Runtime, serialized_engine: buffer) -> tensorrt.tensorrt.ICudaEngine
Invoked with: <tensorrt.tensorrt.Runtime object at 0x7f380bbf8930>, None
some system info if that helps;
trt+cuda - 8.2.1-1+cuda11.4
os - ubuntu 20.04.3
gpu - T4 with 15GB memory
the errors say I need more GPU memory, I was wondering how much GPU memory did you use for a batch size of 5? or maybe I’m missing something?
I would really appreciate any help, thank you!
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:15 (9 by maintainers)
Top GitHub Comments
T5 work requires a good support of the
If
Onnx node, which has been recently added to Onnx Runtime (onlymaster
branch). Triton support will be added when Onnx Runtime 1.12 (somewhere in June) and Triton with Onnx Runtime 1.12 engine will be released.FYI, Triton 22.07 has been released. It fixes a bug where ORT tensors where always put in host memory (plus it’s built with ORT 1.12.0 which have also it’s own memory placement bug).
Updated code of this repo (there are some subtleties to manage, not just an update of the docker image):
https://github.com/ELS-RD/transformer-deploy/pull/116
Let us know if it helps regarding your issue.