Asynchronous web client sending request to triton server
See original GitHub issueIn the image_client.py
, the requests that are asynchronous are appended to the async_requests
list.
async_requests.append(
triton_client.async_infer(
FLAGS.model_name,
inputs,
request_id=str(sent_count),
model_version=FLAGS.model_version,
outputs=outputs))
Once all the requests have been sent, the client later calls the blocking get_result()
on each AsyncInferRequest
which blocks the thread.
if FLAGS.async_set:
# Collect results from the ongoing async requests
# for HTTP Async requests.
for async_request in async_requests:
responses.append(async_request.get_result())
I am trying to implement a web service using the Fast API
framework. Within this service I am making calls to triton server to do inference for each request. I would like to know if there is an async/await
or callback
pattern that triton provides so that I can serve multiple requests concurrently while waiting for the inference results of the current request to be done.
I believe the example provided does not really illustrate the power of async because it is calling the blocking get_result()
?
Hope my question was clear !
Issue Analytics
- State:
- Created 2 years ago
- Comments:14 (3 by maintainers)
Top GitHub Comments
Thanks @alexzubiaga. As it happens, as of a few hours ago, I got the async approach working (via grpc) but without needing to use an internal method. Basically, I imported the aio submodule in grpc and used that to establish the channel.
This seems to work just fine allowing me to use a producer/consumer paradigm where a number of consumers results in concurrent grpc requests. These have proven to be handled just fine for the past several hours although I haven’t yet scaled to a farm of Triton servers yet.
Skeleton code: