question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Triton server always crash during stress test.

See original GitHub issue

Description

Triton Information 22.04

Are you using the Triton container or did you build it yourself?

  • nvcr.io/nvidia/tritonserver:22.02-py3
  • nvcr.io/nvidia/tritonserver:22.04-py3
  • nvcr.io/nvidia/tritonserver:22.07-py3

To Reproduce

Steps to reproduce the behavior.

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

I’m using tritonserver with 3 tensorrt models with 3 python_backend_models with ensemble.

Use more than 8 clients with 8 grpc connections. Then crash.

I0805 05:32:41.580360 1 server.cc:576]
+-------------+-------------------------------------------------------------------------+--------------------------------------------------+
| Backend     | Path                                                                    | Config                                           |
+-------------+-------------------------------------------------------------------------+--------------------------------------------------+
| pytorch     | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so                 | {}                                               |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so         | {}                                               |
| openvino    | /opt/tritonserver/backends/openvino_2021_4/libtriton_openvino_2021_4.so | {}                                               |
| tensorflow  | /opt/tritonserver/backends/tensorflow1/libtriton_tensorflow1.so         | {}                                               |
| python      | /opt/tritonserver/backends/python/libtriton_python.so                   | {"cmdline":{"shm-default-byte-size":"16777216"}} |
| tensorrt    | /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so               | {}                                               |
+-------------+-------------------------------------------------------------------------+--------------------------------------------------+

Expected behavior

Please don’t crash it self.

Crash logs

Signal (11) received.
 0# 0x0000563CE55C17E9 in tritonserver
 1# 0x00007FE5667BA0C0 in /usr/lib/x86_64-linux-gnu/libc.so.6
 2# abort in /usr/lib/x86_64-linux-gnu/libc.so.6
 3# 0x00007FE566B73911 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 4# 0x00007FE566B7F38C in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 5# 0x00007FE566B7E369 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 6# __gxx_personality_v0 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 7# 0x00007FE566979BEF in /usr/lib/x86_64-linux-gnu/libgcc_s.so.1
 8# _Unwind_RaiseException in /usr/lib/x86_64-linux-gnu/libgcc_s.so.1
 9# __cxa_throw in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
10# nvinfer1::Lobber<nvinfer1::InternalError>::operator()(char const*, char const*, int, int, nvinfer1::ErrorCode, char const*) in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
11# 0x00007FE5138650FC in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
12# 0x00007FE5140AFDCF in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
13# 0x00007FE5140661ED in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
14# 0x00007FE5140BD213 in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
15# 0x00007FE513864B55 in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
16# 0x00007FE513401F90 in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
17# 0x00007FE51386A634 in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
18# 0x00007FE513F51B98 in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
19# 0x00007FE513F5234C in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
20# 0x00007FE445581397 in /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so
21# 0x00007FE44558A2BE in /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so
22# TRITONBACKEND_ModelInstanceExecute in /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so
23# 0x00007FE567066D9A in /opt/tritonserver/bin/../lib/libtritonserver.so
24# 0x00007FE567067757 in /opt/tritonserver/bin/../lib/libtritonserver.so
25# 0x00007FE567122AB1 in /opt/tritonserver/bin/../lib/libtritonserver.so
26# 0x00007FE567060C27 in /opt/tritonserver/bin/../lib/libtritonserver.so
27# 0x00007FE566BABDE4 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
28# 0x00007FE567DB2609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0
29# clone in /usr/lib/x86_64-linux-gnu/libc.so.6

 0# 0x000055773616D7E9 in tritonserver
 1# 0x00007F517560B0C0 in /usr/lib/x86_64-linux-gnu/libc.so.6
 2# 0x00007F51226B6230 in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
 3# 0x00007F5122FB9C23 in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
 4# 0x00007F5122EFE39A in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
 5# 0x00007F5122EBAEED in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
 6# 0x00007F5122F0DD83 in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
 7# 0x00007F51226B5ABD in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
 8# 0x00007F51226BA8E3 in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
 9# 0x00007F5122DA2B98 in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
10# 0x00007F5122DA334C in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
11# 0x00007F50C033A397 in /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so
12# 0x00007F50C03432BE in /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so
13# TRITONBACKEND_ModelInstanceExecute in /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so
14# 0x00007F5175EB7D9A in /opt/tritonserver/bin/../lib/libtritonserver.so
15# 0x00007F5175EB8757 in /opt/tritonserver/bin/../lib/libtritonserver.so
16# 0x00007F5175F73AB1 in /opt/tritonserver/bin/../lib/libtritonserver.so
17# 0x00007F5175EB1C27 in /opt/tritonserver/bin/../lib/libtritonserver.so
18# 0x00007F51759FCDE4 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
19# 0x00007F5176C03609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0
20# clone in /usr/lib/x86_64-linux-gnu/libc.so.6

 0# 0x000055A87AE53C19 in tritonserver
 1# 0x00007F13BAACC090 in /usr/lib/x86_64-linux-gnu/libc.so.6
 2# 0x00007F13BAC149C5 in /usr/lib/x86_64-linux-gnu/libc.so.6
 3# std::basic_streambuf<char, std::char_traits<char> >::xsputn(char const*, long) in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 4# std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long) in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 5# 0x00007F13BB38DBB8 in /opt/tritonserver/bin/../lib/libtritonserver.so
 6# 0x00007F13BAEBDDE4 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 7# 0x00007F13BC0D4609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0
 8# clone in /usr/lib/x86_64-linux-gnu/libc.so.6
 0# 0x000055CB56D98C19 in tritonserver
 1# 0x00007FBBD799A090 in /usr/lib/x86_64-linux-gnu/libc.so.6
 2# gsignal in /usr/lib/x86_64-linux-gnu/libc.so.6
 3# abort in /usr/lib/x86_64-linux-gnu/libc.so.6
 4# 0x00007FBBD7D53911 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 5# 0x00007FBBD7D5F38C in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 6# 0x00007FBBD7D5E369 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 7# __gxx_personality_v0 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 8# 0x00007FBBD7B59BEF in /usr/lib/x86_64-linux-gnu/libgcc_s.so.1
 9# _Unwind_RaiseException in /usr/lib/x86_64-linux-gnu/libgcc_s.so.1
10# __cxa_throw in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
11# 0x00007FBADB4FFAA5 in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
12# 0x00007FBADB50F47C in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
13# 0x00007FBADBE0AB6F in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
14# 0x00007FBADBDC5F2D in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
15# 0x00007FBADBE1D472 in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
16# 0x00007FBADB50EFC5 in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
17# 0x00007FBADB0980E0 in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
18# 0x00007FBADB5142A4 in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
19# 0x00007FBADBBF9BC1 in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
20# 0x00007FBADBBFA38C in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
21# 0x00007FBBCC39C7B7 in /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so
22# 0x00007FBBCC3A538E in /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so
23# TRITONBACKEND_ModelInstanceExecute in /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so
24# 0x00007FBBD824B1EA in /opt/tritonserver/bin/../lib/libtritonserver.so
25# 0x00007FBBD824B917 in /opt/tritonserver/bin/../lib/libtritonserver.so
26# 0x00007FBBD830DF51 in /opt/tritonserver/bin/../lib/libtritonserver.so
27# 0x00007FBBD8245787 in /opt/tritonserver/bin/../lib/libtritonserver.so
28# 0x00007FBBD7D8BDE4 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
29# 0x00007FBBD8FA2609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0
30# clone in /usr/lib/x86_64-linux-gnu/libc.so.6

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
kimdwkimdwcommented, Sep 22, 2022

Finally fixed in 22.08

1reaction
tanmayv25commented, Aug 10, 2022

May be changing max_batch_size changed the timing of the execution, which avoided the original race condition which occurred with larger tensors. Anyways, let’s wait to see whether 22.08 fixes this problem.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Triton Server Crashing Running Centerpoint Keypoint ...
The Triton server, running in a container, is crashing when running the Centernet Object & KeyPoints Model via grpc as a TF2 ...
Read more >
GPU-Accelerated Machine Learning Inference as a Service for ...
Saturated Server Stress Test. To understand the behavior of the GPU server performance in a more realistic setup, we set up many simultaneous...
Read more >
Acer's Helios 300 Has BIG Problems... - YouTube
Acer's Helios 300 has got a number of big problems this year. I've tested it in 14 different games and compared it against...
Read more >
Clock Watchdog Timeout Error: Solved [13 Ways To Fix]
Here we will learn what is Clock Watchdog Timeout Error and understand various ways to fix the clock_watchdog_timeout error in Windows 10.
Read more >
trion crach
SPEEDAHOLIC MAX | TRION CRASH | PETER PONG | LEO CHAN【完全新作 COMPLETELY BRAND NEW】C3YOYODESIGN ... Triton server always crash during stress test.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found