question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Python backend segfault with detectron2

See original GitHub issue

Description I am trying to get a detectron2 model running in Triton. In version 21.06 the server hangs on loading the model and displays the following error:

triton_python_b[23212]: segfault at 7f51a3e2666d ip 00007f51a29ed7e4 sp 00007ffe0a569868 error 4 in libc-2.31.so[7f51a2884000+178000]

# addr2line -p -a -f -e /usr/lib/x86_64-linux-gnu/libc-2.31.so 178000
0x0000000000178000: __nss_database_lookup at ??:?

In version 21.04 this error occured sometimes at inference time, there is no output and the server hangs.

I0701 09:00:13.631002 1 infer_request.cc:497] prepared: [0x0x7fb864002d90] request id: 42, model: mag, requested version: 1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 0, priority: 0, timeout (us): 0
original inputs:
[0x0x7fb864003248] input: INPUT0, type: UINT8, original shape: [7372,4146,3], batch + shape: [7372,4146,3], shape: [7372,4146,3]
override inputs:
inputs:
[0x0x7fb864003248] input: INPUT0, type: UINT8, original shape: [7372,4146,3], batch + shape: [7372,4146,3], shape: [7372,4146,3]
original requested outputs:
requested outputs:
OUTPUT0

I0701 09:00:13.631087 1 python.cc:347] model mag, instance mag_0, executing 1 requests

dmesg shows

triton_python_b[30191]: segfault at 7faf0dae966d ip 00007faf0c6b07e4 sp 00007ffe2c701488 error 4 in libc-2.31.so[7faf0c547000+178000]
[ 5125.940257] Code: 7f 07 c5 fe 7f 4f 20 c5 fe 7f 54 17 e0 c5 fe 7f 5c 17 c0 c5 f8 77 c3 48 39 f7 0f 87 ab 00 00 00 0f 84 e5 fe ff ff c5 fe 6f 26 <c5> fe 6f 6c 16 e0 c5 fe 6f 74 16 c0 c5 fe 6f 7c 16 a0 c5 7e 6f 44

this error occurs when using curl on the http endpoint as well as with go in grpc. It does not seem to occur when using the python client or the perf_analyzer. After using the python client once, it works fine with curl/grpc.

Triton Information What version of Triton are you using? Tried on 21.04 and 21.06

Are you using the Triton container or did you build it yourself? Using the triton container with custom pip installs.

FROM nvcr.io/nvidia/tritonserver:21.04-py3 # or 21.06-py3

RUN apt-get update && apt-get install -y python3-dev
RUN python3 -m pip install --upgrade pip
RUN python3 -m pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html
RUN python3 -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.8/index.html
RUN python3 -m pip install opencv-contrib-python-headless

To Reproduce Steps to reproduce the behavior.

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

model config:

name: "mag"
backend: "python"

input [
  {
    name: "INPUT0"
    data_type: TYPE_UINT8
    dims: [ -1 , -1, 3 ]
  }
]

output [
  {
    name: "OUTPUT0"
    data_type: TYPE_FP32
    dims: [ -1, 4 ]
  }
]

instance_group [
  {
    count: 1
    kind: KIND_GPU
  }
]

model.py file: https://gist.github.com/bcvanmeurs/053d4443c9669b74eca742c71eaa4d1d

I can’t share our actual model, but I can create a dummy or supply a pretrained one if necessary.

Expected behavior A consistent inference result or a more clear error message at server startup (21.06).

Thanks in advance!

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:13 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
Tabriziancommented, Jul 21, 2021

Thanks for getting back to us. I can confirm that I have received the email.

0reactions
Tabriziancommented, Sep 22, 2021

Closing the ticket due to in-activity. Feel free to re-open if the issue persists.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Segmentation fault #297 - facebookresearch/detectron2 - GitHub
I run the demo: python demo/demo.py --config-file configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml \ --input ...
Read more >
Docker container random segmentation fault - Stack Overflow
1. It looks like OpenCV is trying to use libgphoto2 as its camera backend, and things related to it are breaking. · Thank...
Read more >
Source code for detectron2.engine.defaults
For python-based LazyConfig, use "path.key=value". ... "eval_only") and args.eval_only): torch.backends.cudnn.benchmark = _try_get_key( cfg, ...
Read more >
EasyBuild v4.6.2 documentation (release 20221021.0)
add script to find dependencies of Python packages (#3839); add ai default module ... also build BLIS backend for FlexiBLAS v3.0.4 with GCC/10.3.0...
Read more >
Detectron2 - How to use Instance Image Segmentation for ...
This tutorial teaches you how to implement instance image segmentation with a real use case.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found