Python backend segfault with detectron2
See original GitHub issueDescription I am trying to get a detectron2 model running in Triton. In version 21.06 the server hangs on loading the model and displays the following error:
triton_python_b[23212]: segfault at 7f51a3e2666d ip 00007f51a29ed7e4 sp 00007ffe0a569868 error 4 in libc-2.31.so[7f51a2884000+178000]
# addr2line -p -a -f -e /usr/lib/x86_64-linux-gnu/libc-2.31.so 178000
0x0000000000178000: __nss_database_lookup at ??:?
In version 21.04 this error occured sometimes at inference time, there is no output and the server hangs.
I0701 09:00:13.631002 1 infer_request.cc:497] prepared: [0x0x7fb864002d90] request id: 42, model: mag, requested version: 1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 0, priority: 0, timeout (us): 0
original inputs:
[0x0x7fb864003248] input: INPUT0, type: UINT8, original shape: [7372,4146,3], batch + shape: [7372,4146,3], shape: [7372,4146,3]
override inputs:
inputs:
[0x0x7fb864003248] input: INPUT0, type: UINT8, original shape: [7372,4146,3], batch + shape: [7372,4146,3], shape: [7372,4146,3]
original requested outputs:
requested outputs:
OUTPUT0
I0701 09:00:13.631087 1 python.cc:347] model mag, instance mag_0, executing 1 requests
dmesg shows
triton_python_b[30191]: segfault at 7faf0dae966d ip 00007faf0c6b07e4 sp 00007ffe2c701488 error 4 in libc-2.31.so[7faf0c547000+178000]
[ 5125.940257] Code: 7f 07 c5 fe 7f 4f 20 c5 fe 7f 54 17 e0 c5 fe 7f 5c 17 c0 c5 f8 77 c3 48 39 f7 0f 87 ab 00 00 00 0f 84 e5 fe ff ff c5 fe 6f 26 <c5> fe 6f 6c 16 e0 c5 fe 6f 74 16 c0 c5 fe 6f 7c 16 a0 c5 7e 6f 44
this error occurs when using curl on the http endpoint as well as with go in grpc. It does not seem to occur when using the python client or the perf_analyzer. After using the python client once, it works fine with curl/grpc.
Triton Information What version of Triton are you using? Tried on 21.04 and 21.06
Are you using the Triton container or did you build it yourself? Using the triton container with custom pip installs.
FROM nvcr.io/nvidia/tritonserver:21.04-py3 # or 21.06-py3
RUN apt-get update && apt-get install -y python3-dev
RUN python3 -m pip install --upgrade pip
RUN python3 -m pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html
RUN python3 -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.8/index.html
RUN python3 -m pip install opencv-contrib-python-headless
To Reproduce Steps to reproduce the behavior.
Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).
model config:
name: "mag"
backend: "python"
input [
{
name: "INPUT0"
data_type: TYPE_UINT8
dims: [ -1 , -1, 3 ]
}
]
output [
{
name: "OUTPUT0"
data_type: TYPE_FP32
dims: [ -1, 4 ]
}
]
instance_group [
{
count: 1
kind: KIND_GPU
}
]
model.py file: https://gist.github.com/bcvanmeurs/053d4443c9669b74eca742c71eaa4d1d
I can’t share our actual model, but I can create a dummy or supply a pretrained one if necessary.
Expected behavior A consistent inference result or a more clear error message at server startup (21.06).
Thanks in advance!
Issue Analytics
- State:
- Created 2 years ago
- Comments:13 (9 by maintainers)

Top Related StackOverflow Question
Thanks for getting back to us. I can confirm that I have received the email.
Closing the ticket due to in-activity. Feel free to re-open if the issue persists.