Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Possibility to recreate new connection to the triton server

See original GitHub issue

Description Hi, this probably does not fall into the bug category and i was not sure whether it may rather fit into feature request, Anyway here it is: we are using grpc streaming client tc::InferenceServerGrpcClient (created with InferenceServerGrpcClient::Create() function), the stream is started with StartStream(...) and we are calling AsyncStreamInfer(..,) to do the infer request. when client gets the results, the pointer is freed and everthing repeats.

i thought that when the client is deleted all the network connections are shutdown as well. however it is not the case. I found out increasing number of TCP connections that goes to triton server, i don’t fully understand when they are created, as it is not after each call to Create() function and after some time I saw error messages of type too many opened files on the server. I was able to partially workaround that with reusing the stream, but it is not optimal because i needed to increase the stream timeout. so this rather seem as a bug to me. however we would like to control when new tcp connections are created anyway, as our clients are behind a load balancer and we would like to create new connection when servers are going to be scaled down.

so my question is whether it is possible to somehow control when new tcp connections are created? i did not find any client api, so maybe with some grpc env variables?

Triton Information r22.02

Are you using the Triton container or did you build it yourself? triton container

To Reproduce see above

Expected behavior client user should be able to control creation of tcp connections

Issue Analytics

State:
Created a year ago
Comments:9 (6 by maintainers)

Top GitHub Comments

1reaction

jbkyang-nvicommented, Oct 28, 2022

@JindrichD the best practices for GRPC for high load applications listed here includes reusing the TCP connections. I might be missing something here in my understanding but some questions:

Why do you need to create a new connection every time? The client actually defaults to reusing connections since closing old connections take more time. a. If you want a new connection every time you create a client, you can call force_new_connection to be true and the new connections will replace the old connection. However if you are reusing connections, you shouldn’t be getting the too many open files message… b. Are you saying you are creating a new url every time you swap out clients? If you do, then it makes sense that there is a too many opened files message. However, if you are only doing load balancing, does the connections in downscaled server need new urls? c. Along the same thread of thought, you said you managed to fix your problem by reusing connections. Why is this a sub-optimal solution?
To your point on the shared std::map that manages all the connections: we can create a flag to close the connection when each client is destructed… this is a feature the grpc client does not currently have.

cc: @tanmayv25 if I’m missing something from the grpc streaming functionality

0reactions

dyastremskycommented, Nov 29, 2022

Closing issue due to lack of activitity. If you need further support, please let us know and we can reopen the issue.

Top Results From Across the Web

Triton Inference Server: The Basics and a Quick Tutorial

Learn about the NVIDIA Triton Inference Server, its key features, models and model repositories, client libraries, and get started with a quick tutorial....

Changing the TRITON management server IP address or name

Configuring TRITON Infrastructure to use a new IP address or hostname, page 7 ... Recreate Apache SSL certificates for the Web Security manager....

Serving Predictions with NVIDIA Triton | Vertex AI

This page describes how to serve prediction requests with NVIDIA Triton inference server by using Vertex AI Prediction. NVIDIA Triton inference server ......

NVIDIA Triton Inference Server

Join the Triton community and stay current on the latest feature updates, bug fixes, and more. Subscribe. Experience enterprise-ready AI inference. Access to ......

Serve multiple models with Amazon SageMaker and Triton ...

If you receive the following prompt message, it means the Triton server is started correctly. Enter nvidia-smi in the terminal to see GPU...