Triton server multiple initialization errors, under kubernetes
See original GitHub issueHello, hope someone can help me, I am reading the following in the log;
$ kubectl logs test-triton-triton-inference-server-8b7bc6c84-drpn9
=============================
== Triton Inference Server ==
=============================
NVIDIA Release 20.08 (build 15533555)
Copyright (c) 2018-2020, NVIDIA CORPORATION. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.
find: '/usr/lib/ssl/private': Permission denied
NOTE: The SHMEM allocation limit is set to the default of 64MB. This may be
insufficient for the inference server. NVIDIA recommends the use of the following flags:
nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...
I1016 08:14:03.413740 1 metrics.cc:184] found 1 GPUs supporting NVML metrics
I1016 08:14:03.419300 1 metrics.cc:193] GPU 0: GeForce GTX 1080 Ti
I1016 08:14:03.419579 1 server.cc:119] Initializing Triton Inference Server
error: creating server: Internal - Unable to create GCS client. Check account credentials.
I am new to kubernetes environment, what can be done to make it function properly? (there are 2 equal GPU hardware installed in the system, and this is a local install. I prefer to use persistent volume for model repo, but failed. Trying gc now but there are still some problems. Any comments for the above appreciated.
Thanks, Alper
Issue Analytics
- State:
- Created 3 years ago
- Comments:10 (4 by maintainers)
Top Results From Across the Web
Triton Inference Server's health status shows 'Connection peer ...
Hi,. Description. Facing error while connecting to Triton inference server(seems server startup is having errors). Environment.
Read more >Accelerating NLP at scale with NVIDIA Triton, Seldon Core ...
Deploying and Scaling AI Applications with the NVIDIA TensorRT Inference Server on Kubernetes · ChatGPT Changes Everything, But Not in the Way ...
Read more >Triton Inference Server in GKE - NVIDIA - Google Kubernetes
With Triton Inference Server, we have the ability to mark a model as PRIORITY_MAX. This means when we consolidate multiple models in the...
Read more >V2 Inference Protocol - KServe Documentation Website
This protocol is endorsed by NVIDIA Triton Inference Server, ... The “server live” API can be used directly to implement the Kubernetes livenessProbe....
Read more >Serving TensorRT Models with NVIDIA Triton Inference Server
In real-time AI model deployment en masse, efficiency of model inference and hardware/GPU usage is paramount.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’m still not sure if you are saying that you believe something is wrong or not. What are you expecting curl to do in these cases? By default curl just prints the body of the response (as in the last case). But for the first 3 there is nothing in the response body. All these endpoints do is return an HTTP status. You can see that status with -v or by using the -w flag as I showed… that is entirely up to how you want to use curl.
I was hoping to see some response, in fact, like the one in this link. (As presented few messages above, /api/status also returns 400, with -v). But anyway, directing the client script to the pod_IP:port returns the correct output of inference, so I can assume no problems here, thank you again.