Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Time to get_async_run_results is very slow

See original GitHub issue

Hi every one, I get result by using async_run method. Time for async_run is 0.007930517196655273, but time for get_async_run_results is 0.11087822914123535. so it is very slow.

   while not last_request:
        input_batch = []
        for idx in range(batch_size):
            input_batch.append(image_data[image_idx])
            image_idx = (image_idx + 1) % len(image_data)
            if image_idx == 0:
                last_request = True
                batch_size = len(input_batch)
                break
        request_ids.append(ctx.async_run(
            {input_name: input_batch},
            {boxes: (InferContext.ResultFormat.RAW),
             scores: (InferContext.ResultFormat.RAW)},
            batch_size))
    time_infer_stop = time.time()
    print("v_detection inference time: ", time_infer_stop - time_infer_start)

    time_post_start = time.time()
    # For async, retrieve results according to the send order
    for request_id in request_ids:
        results.append(ctx.get_async_run_results(request_id, True))
   
    time_post_stop = time.time()
    print("v_detection post processing time: ", time_post_stop - time_post_start)
    print("total_time_process: ", time_post_stop - time_infer_start)

how can i reduce get_async_run_results time?

Issue Analytics

State:
Created 4 years ago
Comments:8 (5 by maintainers)

Top GitHub Comments

1reaction

deadeyegoodwincommented, May 22, 2019

The inference time is determined by the model and the framework. Using the async API doesn’t speed up the actual inference time on the server… it just allows that client thread to do something else instead of just waiting for the response (which is what the thread does in the non-async API).

Have you looked at perf_client: https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-master-branch-guide/docs/client.html#performance-example-application

There is also some information in the blog post listed in the README that talks about using features of TRTIS to get better performance (mostly throughput improvements as latency is more a function of the model and framework).

0reactions

deadeyegoodwincommented, May 30, 2019

Closing. Reopen if you have some more information to report.