question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

perfclient shows weirdly low throughput compared to client application

See original GitHub issue

I have a model with a configuration as follows

...
max_batch_size: 2
input [
   {
      name: "input_1"
      data_type: TYPE_FP32
      format: FORMAT_NHWC
      dims: [ -1, -1, 1 ]
   }
]
output [
   {
      name: "conv2d_19/Sigmoid"
      data_type: TYPE_FP32
      dims: [ -1, -1, 1 ]
   }
]
instance_group [
   {
      count: 2
      kind: KIND_GPU
   }
]
...

I execute the perf client as follows

./perf_client -v -u localhost:8000 -m model_name --input-data random --shape input_1:912,464,1 --percentile=95

While performing the inference with concurrence=1, I am getting a throughput data as follows:

Client:
    Request count: 5
    Throughput: 1 infer/sec
    p50 latency: 1026963 usec
    p90 latency: 1030303 usec
    p95 latency: 1030303 usec
    p99 latency: 1030303 usec
    Avg HTTP time: 1007446 usec (send 985 usec + response wait 569591 usec + receive 436870 usec)
  Server:
    Request count: 6
    Avg request latency: 45210 usec (overhead 7 usec + queue 51 usec + compute 45152 usec)

Inferences/Second vs. Client p95 Batch Latency
Concurrency: 1, throughput: 1 infer/sec, latency 1030303 usec

This inference throughput of 1 infer/sec is way lower than the client application that processes 996 of such image tiles in roughly 30 sec with some bits of pre and post processing that too using a python api.

...
_ = infer_ctx.async_run(partial(completion_callback, user_data, idx, tile, pads),
                                            {input_name: [tile_data]},
                                            {output_name: InferContext.ResultFormat.RAW},
                                            batch_size)
...
.
.
.
...
# Wait for deferred items from callback functions
(infer_ctx_, request_id_, idx_, tile_, pads_) = user_data._completed_requests.get()
# Process results
result = infer_ctx_.get_async_run_results(request_id_)
...

Any thoughts on why perf_client is reporting so low numbers?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:11 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
deadeyegoodwincommented, Jun 12, 2020

Glad you found the issue!

0reactions
data-pandacommented, Jun 12, 2020

@deadeyegoodwin David, we have finally found the reason for the dismal performance and that was a proxy server settings which was slowing down the http response time. Thanks for sharing the inference throughput results at your end which helped us in finding the actual issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

perfclient shows weirdly low throughput compared to client ...
This inference throughput of 1 infer/sec is way lower than the client application that processes 996 of such image tiles in roughly 30...
Read more >
Tracking TM1 Usage - TM1 Forum
I am working with a slow Unix server and version 9.1 SP3 (9.1.3) and would prefer to leave cube logging off to avoid...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found