[question] Adding timeout options for both client and server with a custom backend and stateful batching
See original GitHub issueI’m interested in adding some good timeout behaviour for a deployment where we use a custom backend for triton that uses a stateful batcher. I’m using the stateful backend as a reference.
- When timing out on the client side, I can see we use the
stream_timeout
option (as is done here) but it’s not clear how this interacts with the server. If I terminate the stream via the timeout on the client side, does that cause any code to run on the server side? - In the stateful batcher example, it looks like there’s a separate timer for each stream that evicts the stream from the server when it times out. Similar to before, does this interact with the client in some way? Can we send an error message to the client when this happens.
In general any advice on managing clients and their states in a robust way would be appreciated, we’d really like to avoid any instance where a client is taking up a slot on the server but has already timed out (or the client is waiting for a response from the server that has already timed out its slot).
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:5 (3 by maintainers)
Top Results From Across the Web
A Complete Guide to Timeouts in Node.js - Better Stack
Assigning timeout values prevents network operations in Node.js from blocking indefinitely. This article provides extensive instruction on how to time out ...
Read more >AWS Solutions Architect Associate Exam Questions for FREE
Here we've a list of free AWS Solutions Architect Exam Questions and Answers for you to prepare well for the AWS Solution Architect...
Read more >HAProxy version 2.4.15 - Configuration Manual - GitHub Pages
Simple configuration for an HTTP proxy listening on port 80 on all # interfaces and forwarding requests to a single backend "servers" with...
Read more >T45888 Batch Parsoid's API requests - Wikimedia Phabricator
Adding custom batching support just for template expansion and extension tag calls would really ... and would add complexity to both clients and...
Read more >Documentation - Apache Kafka
The new Java Consumer now supports heartbeating from a background thread. There is a new configuration max.poll.interval.ms which controls the maximum time ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I think it will terminate the GRPC stream and will close the connection to the server side. (CC @tanmayv25).
I think this is mainly for deleting the storage associated with the correlation IDs stored in the backend. I don’t think it does interact with the client. This is mainly for removing the storage when the
max_sequence_idle_microseconds
has elapsed.You might also be interested in the implicit state management API for the backends: https://github.com/triton-inference-server/core/blob/main/include/triton/core/tritonbackend.h#L689-L758
Currently, only TensorRT and ONNX backends implement this API but you can incorporate this into your own custom backends too. Using implicit state management, the state tensors will be internally handled by Triton core and you don’t need store them in your backend.
Thanks, Iman. What you said about the stateful backend is correct. The internal timer is there only to cleanup the states for timed-out sequences.