Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

perfclient shows weirdly low throughput compared to client application

See original GitHub issue

I have a model with a configuration as follows

...
max_batch_size: 2
input [
   {
      name: "input_1"
      data_type: TYPE_FP32
      format: FORMAT_NHWC
      dims: [ -1, -1, 1 ]
   }
]
output [
   {
      name: "conv2d_19/Sigmoid"
      data_type: TYPE_FP32
      dims: [ -1, -1, 1 ]
   }
]
instance_group [
   {
      count: 2
      kind: KIND_GPU
   }
]
...

I execute the perf client as follows

./perf_client -v -u localhost:8000 -m model_name --input-data random --shape input_1:912,464,1 --percentile=95

While performing the inference with concurrence=1, I am getting a throughput data as follows:

Client:
    Request count: 5
    Throughput: 1 infer/sec
    p50 latency: 1026963 usec
    p90 latency: 1030303 usec
    p95 latency: 1030303 usec
    p99 latency: 1030303 usec
    Avg HTTP time: 1007446 usec (send 985 usec + response wait 569591 usec + receive 436870 usec)
  Server:
    Request count: 6
    Avg request latency: 45210 usec (overhead 7 usec + queue 51 usec + compute 45152 usec)

Inferences/Second vs. Client p95 Batch Latency
Concurrency: 1, throughput: 1 infer/sec, latency 1030303 usec

This inference throughput of 1 infer/sec is way lower than the client application that processes 996 of such image tiles in roughly 30 sec with some bits of pre and post processing that too using a python api.

...
_ = infer_ctx.async_run(partial(completion_callback, user_data, idx, tile, pads),
                                            {input_name: [tile_data]},
                                            {output_name: InferContext.ResultFormat.RAW},
                                            batch_size)
...
.
.
.
...
# Wait for deferred items from callback functions
(infer_ctx_, request_id_, idx_, tile_, pads_) = user_data._completed_requests.get()
# Process results
result = infer_ctx_.get_async_run_results(request_id_)
...

Any thoughts on why perf_client is reporting so low numbers?

Issue Analytics

State:
Created 3 years ago
Comments:11 (5 by maintainers)

Top GitHub Comments

1reaction

deadeyegoodwincommented, Jun 12, 2020

Glad you found the issue!

0reactions

data-pandacommented, Jun 12, 2020

@deadeyegoodwin David, we have finally found the reason for the dismal performance and that was a proxy server settings which was slowing down the http response time. Thanks for sharing the inference throughput results at your end which helped us in finding the actual issue.