Triton terminated with Signal (6)
See original GitHub issueWhen using triton grpc client to infer, triton will exit unexpectedly sometimes. like using:
with tritonclient.grpc.InferenceServerClient('localhost:8001', verbose = False) as client:
outputs = [
httpclient.InferRequestedOutput('logits', ),
httpclient.InferRequestedOutput('embs', )
]
# data_loader is a torch dataloader with 4 workers
for sent_count, test_batch in enumerate(data_loader):
with autocast():
processed_signal, processed_signal_length = preprocessor(
input_signal = test_batch[0].to(device),
length = test_batch[1].to(device)
)
inputs = [
httpclient.InferInput("audio_signal", list(processed_signal.shape), "FP16"),
httpclient.InferInput("length", [1, 1], np_to_triton_dtype(np.int32))
]
inputs[0].set_data_from_numpy(processed_signal.cpu().numpy().astype(np.float16), )
inputs[1].set_data_from_numpy(processed_signal_length.cpu().numpy().astype(np.int32).reshape(1, 1))
result = client.infer(model_name = "tensorrt_emb",
inputs = inputs,
outputs = outputs)
and tritonserver output:
terminate called after throwing an instance of ‘nvinfer1::InternalError’ what(): Assertion mUsedAllocators.find(alloc) != mUsedAllocators.end() && “Myelin free callback called with invalid MyelinAllocator” failed. Signal (6) received. 0# 0x00005602FC4F21B9 in tritonserver 1# 0x00007FC98736C0C0 in /usr/lib/x86_64-linux-gnu/libc.so.6 2# gsignal in /usr/lib/x86_64-linux-gnu/libc.so.6 3# abort in /usr/lib/x86_64-linux-gnu/libc.so.6 4# 0x00007FC987725911 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 5# 0x00007FC98773138C in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 6# 0x00007FC987730369 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 7# __gxx_personality_v0 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 8# 0x00007FC98752BBEF in /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 9# _Unwind_RaiseException in /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 10# __cxa_throw in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 11# nvinfer1::Lobbernvinfer1::InternalError::operator()(char const*, char const*, int, int, nvinfer1::ErrorCode, char const*) in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8 12# 0x00007FC9020EECBC in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8 13# 0x00007FC902A7220F in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8 14# 0x00007FC902A2862D in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8 15# 0x00007FC902A7F653 in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8 16# 0x00007FC9020EE715 in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8 17# 0x00007FC901C8BAD0 in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8 18# 0x00007FC9020F41F4 in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8 19# 0x00007FC902913FD8 in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8 20# 0x00007FC90291478C in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8 21# 0x00007FC97A57C6D7 in /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so 22# 0x00007FC97A5855FE in /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so 23# TRITONBACKEND_ModelInstanceExecute in /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so 24# 0x00007FC987C1D73A in /opt/tritonserver/bin/…/lib/libtritonserver.so 25# 0x00007FC987C1E0F7 in /opt/tritonserver/bin/…/lib/libtritonserver.so 26# 0x00007FC987CDB411 in /opt/tritonserver/bin/…/lib/libtritonserver.so 27# 0x00007FC987C175C7 in /opt/tritonserver/bin/…/lib/libtritonserver.so 28# 0x00007FC98775DDE4 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 29# 0x00007FC98896D609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0 30# clone in /usr/lib/x86_64-linux-gnu/libc.so.6
tritonserver version:22.05-py3(docker image) using tensorrt backend. os: ubuntu 20.04
How To Reproduce We use trtexec to transform a onnx model to tensorRT engine(with maxShapes=1x80x12000), then put into triton model repository. When send dozens of request with shape 1x80x11000(like 8000) and other model requests in same time(different grpc client in different process, not multiprocessing, but multiple .py running),triton will exit by chance.
Issue Analytics
- State:
- Created a year ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
TensorRT team seems to have a fix that can resolve this issue. We are working with them to make the fix available to Triton users.
Yes, it’s related to request concurrency. And I feel like it appear with higher opportunity when there are lots of request with almost maximum shape. I checked with top, dmesg, nvidia-smi, it’s seems no memory issue, including CUDA memory (RTX3090) and system ram. model configuration(report bug one):
There are two other models in model repository, total instance count is 3.
By the way, we tried triton and onnx model. It’s normal.