Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Triton server multiple initialization errors, under kubernetes

See original GitHub issue

Hello, hope someone can help me, I am reading the following in the log;

$ kubectl logs test-triton-triton-inference-server-8b7bc6c84-drpn9

=============================
== Triton Inference Server ==
=============================

NVIDIA Release 20.08 (build 15533555)

Copyright (c) 2018-2020, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.
find: '/usr/lib/ssl/private': Permission denied

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for the inference server.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

I1016 08:14:03.413740 1 metrics.cc:184] found 1 GPUs supporting NVML metrics
I1016 08:14:03.419300 1 metrics.cc:193]   GPU 0: GeForce GTX 1080 Ti
I1016 08:14:03.419579 1 server.cc:119] Initializing Triton Inference Server
error: creating server: Internal - Unable to create GCS client. Check account credentials.

I am new to kubernetes environment, what can be done to make it function properly? (there are 2 equal GPU hardware installed in the system, and this is a local install. I prefer to use persistent volume for model repo, but failed. Trying gc now but there are still some problems. Any comments for the above appreciated.

Thanks, Alper

Issue Analytics

State:
Created 3 years ago
Comments:10 (4 by maintainers)

Top GitHub Comments

1reaction

deadeyegoodwincommented, Oct 21, 2020

I’m still not sure if you are saying that you believe something is wrong or not. What are you expecting curl to do in these cases? By default curl just prints the body of the response (as in the last case). But for the first 3 there is nothing in the response body. All these endpoints do is return an HTTP status. You can see that status with -v or by using the -w flag as I showed… that is entirely up to how you want to use curl.

0reactions

ontheway16commented, Oct 21, 2020

I was hoping to see some response, in fact, like the one in this link. (As presented few messages above, /api/status also returns 400, with -v). But anyway, directing the client script to the pod_IP:port returns the correct output of inference, so I can assume no problems here, thank you again.

Top Results From Across the Web

Triton Inference Server's health status shows 'Connection peer ...

Hi,. Description. Facing error while connecting to Triton inference server(seems server startup is having errors). Environment.

Accelerating NLP at scale with NVIDIA Triton, Seldon Core ...

Deploying and Scaling AI Applications with the NVIDIA TensorRT Inference Server on Kubernetes · ChatGPT Changes Everything, But Not in the Way ...

Triton Inference Server in GKE - NVIDIA - Google Kubernetes

With Triton Inference Server, we have the ability to mark a model as PRIORITY_MAX. This means when we consolidate multiple models in the...

V2 Inference Protocol - KServe Documentation Website

This protocol is endorsed by NVIDIA Triton Inference Server, ... The “server live” API can be used directly to implement the Kubernetes livenessProbe....

Serving TensorRT Models with NVIDIA Triton Inference Server

In real-time AI model deployment en masse, efficiency of model inference and hardware/GPU usage is paramount.