Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Python backend segfault with detectron2

See original GitHub issue

Description I am trying to get a detectron2 model running in Triton. In version 21.06 the server hangs on loading the model and displays the following error:

triton_python_b[23212]: segfault at 7f51a3e2666d ip 00007f51a29ed7e4 sp 00007ffe0a569868 error 4 in libc-2.31.so[7f51a2884000+178000]

# addr2line -p -a -f -e /usr/lib/x86_64-linux-gnu/libc-2.31.so 178000
0x0000000000178000: __nss_database_lookup at ??:?

In version 21.04 this error occured sometimes at inference time, there is no output and the server hangs.

I0701 09:00:13.631002 1 infer_request.cc:497] prepared: [0x0x7fb864002d90] request id: 42, model: mag, requested version: 1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 0, priority: 0, timeout (us): 0
original inputs:
[0x0x7fb864003248] input: INPUT0, type: UINT8, original shape: [7372,4146,3], batch + shape: [7372,4146,3], shape: [7372,4146,3]
override inputs:
inputs:
[0x0x7fb864003248] input: INPUT0, type: UINT8, original shape: [7372,4146,3], batch + shape: [7372,4146,3], shape: [7372,4146,3]
original requested outputs:
requested outputs:
OUTPUT0

I0701 09:00:13.631087 1 python.cc:347] model mag, instance mag_0, executing 1 requests

dmesg shows

triton_python_b[30191]: segfault at 7faf0dae966d ip 00007faf0c6b07e4 sp 00007ffe2c701488 error 4 in libc-2.31.so[7faf0c547000+178000]
[ 5125.940257] Code: 7f 07 c5 fe 7f 4f 20 c5 fe 7f 54 17 e0 c5 fe 7f 5c 17 c0 c5 f8 77 c3 48 39 f7 0f 87 ab 00 00 00 0f 84 e5 fe ff ff c5 fe 6f 26 <c5> fe 6f 6c 16 e0 c5 fe 6f 74 16 c0 c5 fe 6f 7c 16 a0 c5 7e 6f 44

this error occurs when using curl on the http endpoint as well as with go in grpc. It does not seem to occur when using the python client or the perf_analyzer. After using the python client once, it works fine with curl/grpc.

Triton Information What version of Triton are you using? Tried on 21.04 and 21.06

Are you using the Triton container or did you build it yourself? Using the triton container with custom pip installs.

FROM nvcr.io/nvidia/tritonserver:21.04-py3 # or 21.06-py3

RUN apt-get update && apt-get install -y python3-dev
RUN python3 -m pip install --upgrade pip
RUN python3 -m pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html
RUN python3 -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.8/index.html
RUN python3 -m pip install opencv-contrib-python-headless

To Reproduce Steps to reproduce the behavior.

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

model config:

name: "mag"
backend: "python"

input [
  {
    name: "INPUT0"
    data_type: TYPE_UINT8
    dims: [ -1 , -1, 3 ]
  }
]

output [
  {
    name: "OUTPUT0"
    data_type: TYPE_FP32
    dims: [ -1, 4 ]
  }
]

instance_group [
  {
    count: 1
    kind: KIND_GPU
  }
]

model.py file: https://gist.github.com/bcvanmeurs/053d4443c9669b74eca742c71eaa4d1d

I can’t share our actual model, but I can create a dummy or supply a pretrained one if necessary.

Expected behavior A consistent inference result or a more clear error message at server startup (21.06).

Thanks in advance!

Issue Analytics

State:
Created 2 years ago
Comments:13 (9 by maintainers)

Top GitHub Comments

1reaction

Tabriziancommented, Jul 21, 2021

Thanks for getting back to us. I can confirm that I have received the email.

0reactions

Tabriziancommented, Sep 22, 2021

Closing the ticket due to in-activity. Feel free to re-open if the issue persists.

Top Results From Across the Web

Segmentation fault #297 - facebookresearch/detectron2 - GitHub

I run the demo: python demo/demo.py --config-file configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml \ --input ...

Docker container random segmentation fault - Stack Overflow

1. It looks like OpenCV is trying to use libgphoto2 as its camera backend, and things related to it are breaking. · Thank...

Source code for detectron2.engine.defaults

For python-based LazyConfig, use "path.key=value". ... "eval_only") and args.eval_only): torch.backends.cudnn.benchmark = _try_get_key( cfg, ...

EasyBuild v4.6.2 documentation (release 20221021.0)

add script to find dependencies of Python packages (#3839); add ai default module ... also build BLIS backend for FlexiBLAS v3.0.4 with GCC/10.3.0...

Detectron2 - How to use Instance Image Segmentation for ...

This tutorial teaches you how to implement instance image segmentation with a real use case.

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Python backend segfault with detectron2

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

python backend with custom packages reports error "Internal: Failed to initialize stub, stub process exited unexpectedly"

segfault at ... error 6 in libc-2.27.so