[Question] About perf_analyzer request & execution count
See original GitHub issueHello guys. I have a question about perf_analyzer request & execution count.
root@ai-gpu-2:/workspace# perf_analyzer -m ensemble_models -b 32 --shape INPUT:640,640,3
*** Measurement Settings ***
Batch size: 32
Using "time_windows" mode for stabilization
Measurement window: 5000 msec
Using synchronous calls for inference
Stabilizing using average latency
Request concurrency: 1
Client:
Request count: 17
Throughput: 108.8 infer/sec
...
Server:
Inference count: 672
Execution count: 21
Successful request count: 21
Why is execution count > request count in the example below? What exactly does execution count mean? Thanks in advance.
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (4 by maintainers)
Top Results From Across the Web
Query Execution Count in DPA - Forum - SolarWinds THWACK
Can someone direct me, as I've got a SQL Server query being executed (I believe) many, many times per day by an application,...
Read more >Getting the Most Out of NVIDIA T4 on AWS G4 Instances
For concurrent model execution, directly specify the model concurrency per GPU by changing the count number in the instance_group .
Read more >stackmine_icse2012.pdf - Microsoft
execution traces. Based on real-adoption experiences of. StackMine in practice, we conducted an evaluation of. StackMine on performance debugging in the ...
Read more >Performance Analyzer - OpenSearch documentation
Performance analyzer is an agent and REST API that allows you to query numerous performance metrics for your cluster, including aggregations of those...
Read more >Towards Data Science
Your home for data science. A Medium publication sharing concepts, ideas and codes.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Execution count is the number of the model run, where request count is the number of inference request sent to the model. When using dynamic batching, Triton may batch multiple requests for one model run and in such case you can have smaller execution count than request count.
The client stats and server stats are being collected in different location where client stats is collected inside perf analyzer while server stats is calculated based on the delta reported from server stats API, so the variance will be observed if you compare client stats with server stats.