Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[question] Adding timeout options for both client and server with a custom backend and stateful batching

See original GitHub issue

I’m interested in adding some good timeout behaviour for a deployment where we use a custom backend for triton that uses a stateful batcher. I’m using the stateful backend as a reference.

When timing out on the client side, I can see we use the stream_timeout option (as is done here) but it’s not clear how this interacts with the server. If I terminate the stream via the timeout on the client side, does that cause any code to run on the server side?
In the stateful batcher example, it looks like there’s a separate timer for each stream that evicts the stream from the server when it times out. Similar to before, does this interact with the client in some way? Can we send an error message to the client when this happens.

In general any advice on managing clients and their states in a robust way would be appreciated, we’d really like to avoid any instance where a client is taking up a slot on the server but has already timed out (or the client is waiting for a response from the server that has already timed out its slot).

Issue Analytics

State:
Created a year ago
Reactions:1
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

Tabriziancommented, Jul 14, 2022

When timing out on the client side, I can see we use the stream_timeout option (as is done here) but it’s not clear how this interacts with the server. If I terminate the stream via the timeout on the client side, does that cause any code to run on the server side?

I think it will terminate the GRPC stream and will close the connection to the server side. (CC @tanmayv25).

In the stateful batcher example, it looks like there’s a separate timer for each stream that evicts the stream from the server when it times out. Similar to before, does this interact with the client in some way? Can we send an error message to the client when this happens.

I think this is mainly for deleting the storage associated with the correlation IDs stored in the backend. I don’t think it does interact with the client. This is mainly for removing the storage when the max_sequence_idle_microseconds has elapsed.

You might also be interested in the implicit state management API for the backends: https://github.com/triton-inference-server/core/blob/main/include/triton/core/tritonbackend.h#L689-L758

Currently, only TensorRT and ONNX backends implement this API but you can incorporate this into your own custom backends too. Using implicit state management, the state tensors will be internally handled by Triton core and you don’t need store them in your backend.

0reactions

rakib-hasancommented, Jul 19, 2022

Thanks, Iman. What you said about the stateful backend is correct. The internal timer is there only to cleanup the states for timed-out sequences.

Top Results From Across the Web

A Complete Guide to Timeouts in Node.js - Better Stack

Assigning timeout values prevents network operations in Node.js from blocking indefinitely. This article provides extensive instruction on how to time out ...

AWS Solutions Architect Associate Exam Questions for FREE

Here we've a list of free AWS Solutions Architect Exam Questions and Answers for you to prepare well for the AWS Solution Architect...

HAProxy version 2.4.15 - Configuration Manual - GitHub Pages

Simple configuration for an HTTP proxy listening on port 80 on all # interfaces and forwarding requests to a single backend "servers" with...

T45888 Batch Parsoid's API requests - Wikimedia Phabricator

Adding custom batching support just for template expansion and extension tag calls would really ... and would add complexity to both clients and...

Documentation - Apache Kafka

The new Java Consumer now supports heartbeating from a background thread. There is a new configuration max.poll.interval.ms which controls the maximum time ...