Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Triton server periodically stops responding to `ServerLive`, `ServerReady` and `RepositoryIndex` requests

See original GitHub issue

Description We have an app that periodically (every 15 seconds) queries triton-server for liveness (ServerLive request), readiness (ServerReady request) and models info (RepositoryIndex request). For the past weeks, every 1 hour (and now every 15 minutes) triton-server would stop responding to these requests from the app for about 2-7 minutes.

These are the logs we are getting from our app. The app continues to send ServerLive requests at the same rate (every 15 seconds) no matter was the response, but it skips sending ServerReady and RepositoryIndex requests if ServerLive fails:

Increasing the context timeout value in the client app from 10 seconds to an hour does not help. We tried sending RepositoryIndex requests during the bug but they also time out. From triton-server side, these are the logs we get when the bug happens (with the exclusion of some specific lines/noise that you can see in the filter). We can notice the time jump between the 2 selected lines, that’s exactly when timeout errors start happening in the client app. In particular, we can see that RepositoryIndex process id 2485 started at 18:11:27.792 and only finished at 18:17:23.783:

During that time jump / bug, ServerLive and ServerReady requests also stop working. However, also during the bug, endpoint /v2/health/ready works and reports that triton-server is indeed ready.

The problem always happens to all instances of our client app at the same time, no matter was the starting time of our client app instances.

We are running triton-server as a single pod in a GKE environment version 1.21, injected by Linkerd sidecar proxy. but I think we can rule Linkerd out of this since the problem also happens to a local instance of our client app (running on a laptop) along with the other instances of the client app (running as pods in the cluster) at the same time. The local instance is connected to the same triton-server pod through its container port using kubectl port-forward.

The only lead we have is that network traffic increases dramatically some times at the same time the bug happens (not always though). We noticed no weird spikes in other resources usage (CPU, Memory, Disk I/O) during the bug:

Our models are stored on GCS, but we did not activate repository poll, and to my understanding, it should be disabled by default. We start triton-server process in the container with command:

tritonserver --model-store=gs://test-repo/tfserving --strict-model-config=true --min-supported-compute-capability=3.7 --log-verbose=1 --backend-config=tensorflow,version=2

Triton Information Using container tritonserver:22.01-py3

To Reproduce Send ServerLive, ServerReady and RepositoryIndex requests to triton-server every 15 seconds.

Expected behavior Always get a response for all requests.

Issue Analytics

State:
Created a year ago
Comments:9 (5 by maintainers)

Top GitHub Comments

1reaction

tanmayv25commented, Aug 11, 2022

inference always works, even during the bug.

This most likely means the handler thread(shared between non-infer API calls) is getting stuck. Let’s see if upgrading the gRPC resolves the issue.

1reaction

GuanLuocommented, Aug 4, 2022

The GRPC upgrade will be include in 22.08 which will be released in late August, you may build from source to test it out before the release.

Top Results From Across the Web

nvidia_inferenceserver - Go Packages

github.com/sunhailin-Leo/triton-service-go ... RepositoryIndex(ctx context. ... the request metadata //@@ does NOT hash to an existing entry in the cache.

Triton Inference Server Release 21.10

Windows Triton build now supports HTTP protocol. Triton added support for caching responses to inference requests. Sequence IDs can now accept ...

Triton Inference Server: The Basics and a Quick Tutorial

Learn about the NVIDIA Triton Inference Server, its key features, models and model repositories, client libraries, and get started with a quick tutorial....

src/core/grpc_service.proto · Destiny/server - Gitee.com

The Triton Inference Server provides an optimized cloud and edge inferencing solution. ... cpp:var:: rpc ServerReady(ServerReadyRequest) returns.

High-performance serving with Triton Inference Server (Preview)

Access to NCv3-series VMs for your Azure subscription. Important. You may need to request a quota increase for your subscription before you can ......