question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Triton terminated with Signal (6)

See original GitHub issue

When using triton grpc client to infer, triton will exit unexpectedly sometimes. like using:

with tritonclient.grpc.InferenceServerClient('localhost:8001', verbose = False) as client:
            outputs = [
                httpclient.InferRequestedOutput('logits', ),
                httpclient.InferRequestedOutput('embs', )
            ]

            # data_loader is a torch dataloader with 4 workers
            for sent_count, test_batch in enumerate(data_loader):
                with autocast():
                    processed_signal, processed_signal_length = preprocessor(
                        input_signal = test_batch[0].to(device),
                        length = test_batch[1].to(device)
                    )
                inputs = [
                    httpclient.InferInput("audio_signal", list(processed_signal.shape), "FP16"),
                    httpclient.InferInput("length", [1, 1], np_to_triton_dtype(np.int32))
                ]
                inputs[0].set_data_from_numpy(processed_signal.cpu().numpy().astype(np.float16), )
                inputs[1].set_data_from_numpy(processed_signal_length.cpu().numpy().astype(np.int32).reshape(1, 1))
                result = client.infer(model_name = "tensorrt_emb",
                                          inputs = inputs,
                                          outputs = outputs)

and tritonserver output:

terminate called after throwing an instance of ‘nvinfer1::InternalError’ what(): Assertion mUsedAllocators.find(alloc) != mUsedAllocators.end() && “Myelin free callback called with invalid MyelinAllocator” failed. Signal (6) received. 0# 0x00005602FC4F21B9 in tritonserver 1# 0x00007FC98736C0C0 in /usr/lib/x86_64-linux-gnu/libc.so.6 2# gsignal in /usr/lib/x86_64-linux-gnu/libc.so.6 3# abort in /usr/lib/x86_64-linux-gnu/libc.so.6 4# 0x00007FC987725911 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 5# 0x00007FC98773138C in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 6# 0x00007FC987730369 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 7# __gxx_personality_v0 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 8# 0x00007FC98752BBEF in /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 9# _Unwind_RaiseException in /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 10# __cxa_throw in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 11# nvinfer1::Lobbernvinfer1::InternalError::operator()(char const*, char const*, int, int, nvinfer1::ErrorCode, char const*) in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8 12# 0x00007FC9020EECBC in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8 13# 0x00007FC902A7220F in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8 14# 0x00007FC902A2862D in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8 15# 0x00007FC902A7F653 in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8 16# 0x00007FC9020EE715 in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8 17# 0x00007FC901C8BAD0 in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8 18# 0x00007FC9020F41F4 in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8 19# 0x00007FC902913FD8 in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8 20# 0x00007FC90291478C in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8 21# 0x00007FC97A57C6D7 in /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so 22# 0x00007FC97A5855FE in /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so 23# TRITONBACKEND_ModelInstanceExecute in /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so 24# 0x00007FC987C1D73A in /opt/tritonserver/bin/…/lib/libtritonserver.so 25# 0x00007FC987C1E0F7 in /opt/tritonserver/bin/…/lib/libtritonserver.so 26# 0x00007FC987CDB411 in /opt/tritonserver/bin/…/lib/libtritonserver.so 27# 0x00007FC987C175C7 in /opt/tritonserver/bin/…/lib/libtritonserver.so 28# 0x00007FC98775DDE4 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 29# 0x00007FC98896D609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0 30# clone in /usr/lib/x86_64-linux-gnu/libc.so.6

tritonserver version:22.05-py3(docker image) using tensorrt backend. os: ubuntu 20.04

How To Reproduce We use trtexec to transform a onnx model to tensorRT engine(with maxShapes=1x80x12000), then put into triton model repository. When send dozens of request with shape 1x80x11000(like 8000) and other model requests in same time(different grpc client in different process, not multiprocessing, but multiple .py running),triton will exit by chance.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

3reactions
tanmayv25commented, Jul 6, 2022

TensorRT team seems to have a fix that can resolve this issue. We are working with them to make the fix available to Triton users.

1reaction
erichthocommented, Jul 6, 2022

Yes, it’s related to request concurrency. And I feel like it appear with higher opportunity when there are lots of request with almost maximum shape. I checked with top, dmesg, nvidia-smi, it’s seems no memory issue, including CUDA memory (RTX3090) and system ram. model configuration(report bug one):

name: "tensorrt_emb"
backend: "tensorrt"
max_batch_size: 1

input [
  {
    name: "audio_signal"
    data_type: TYPE_FP32
    dims: [80, -1]
  }
]
input [
  {
    name: "length"
    data_type: TYPE_INT32
    dims: [ 1 ]
    reshape: { shape: [ ] }
  }
]

output [
  {
    name: "logits"
    data_type: TYPE_FP16
    dims: [ -1 ]
  }
]
output [
  {
    name: "embs"
    data_type: TYPE_FP16
    dims: [ -1 ]
  }
]

instance_group [
  {
    count: 3
    kind: KIND_GPU
  }
]

dynamic_batching {
  preferred_batch_size: [1]
  max_queue_delay_microseconds: 1
  preserve_ordering: true
}

There are two other models in model repository, total instance count is 3.

By the way, we tried triton and onnx model. It’s normal.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Program terminated with signal 6 - lwp_kill cxa_guard_acquire
Hi, I'm facing some coredump problems with the signal 6 in a c++ multi-thread application. The errors are the following:
Read more >
Triton Plus - Grass Valley
Termination Plug . ... Connecting Signal Cables to the Triton Plus Router . ... 6. Triton Plus - Analog Video - User Manual....
Read more >
Gst-nvinferserver — DeepStream 6.1.1 Release documentation
The plugin supports Triton features along with multiple deep-learning frameworks such as TensorRT, TensorFlow (GraphDef / SavedModel), ONNX and ...
Read more >
User Manual - Snap-on
Section 6: Fast-Track® Intelligent Diagnostics ... All safety messages contain a safety signal word that indicates the level of the ... ON YOUR...
Read more >
Triton Go Product Manual
6. TX-. RS485 transmit data - (should be connected to master RX-). Notes. • The Triton does not include any termination resistors to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found