Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Server Queue

See original GitHub issue

I have read the documentation and did not find any place talking about Server Queue Size. As far as I understood from the TRTIS Architecture, incoming inference requests are queued by Model Schedulers and when Execution Context is available, the request is passed for inference. I would like to know the Server Queue Size or if possible how to set it. This would help to control the incoming request traffic.

Issue Analytics

State:
Created 4 years ago
Comments:8 (4 by maintainers)

Top GitHub Comments

2reactions

deadeyegoodwincommented, Oct 28, 2020

We will consider queue depth as an enhancement to the statistics API. Note that the statistics already report average time that requests spend in the queue which is likely a good substitute.

1reaction

royinxcommented, Sep 3, 2021

is there any update?

Actually what we want is real time, pending (data received and scheduled ) request queue size for auto scaling. According to the last reply, average time of requests spend is cumulative , developer cannot obtain the last 15 mins queue size for auto-scaling. is that the latency is the only way to detect the request loads?