question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Nonlinear increase of throughput as the number of CPU instances increases

See original GitHub issue

Description Increasing the number of CPU instances of a model increases compute infer time. Throughput is also not increasing linearly and sometimes decreasing.

Triton Information What version of Triton are you using? r21.12

Are you using the Triton container or did you build it yourself? I have used the Triton container and model analyzer container.

To Reproduce

Build model analyzer image initially

git clone https://github.com/triton-inference-server/model_analyzer -b r21.12
cd model_analyzer
docker build --pull -t model-analyzer .

Create add_sub model

git clone https://github.com/triton-inference-server/python_backend -b r21.12
cd python_backend
mkdir data
mkdir -p models/add_sub/1/
cp examples/add_sub/model.py models/add_sub/1/model.py
cp examples/add_sub/config.pbtxt models/add_sub/config.pbtxt

Create config.yaml inside data directory as follows

model_repository: /models
profile_models:
  add_sub:
    parameters:
      concurrency:
        start: 32
        stop: 32
    model_config_parameters:
      instance_group:
        - kind: KIND_CPU
          count: [1, 2, 3, 4]
override_output_model_repository: True
client_protocol: grpc

Create analyze.yaml inside data directory as follows

analysis_models:
  add_sub:
    objectives:
    - perf_throughput
inference_output_fields: [
    'model_name', 'concurrency', 'model_config_path',
    'instance_group', 'perf_throughput',
    'perf_latency_p99','perf_client_response_wait',
    'perf_server_queue', 'perf_server_compute_infer'
]

Run the model analyzer inside python_backend directory

docker run -it --rm --shm-size=2g --gpus all \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v ${PWD}/models:/models \
    -v ${PWD}/data/:/data \
    --net=host --name model-analyzer \
    model-analyzer /bin/bash

Run the following commands inside the container

model-analyzer profile --config-file /data/config.yaml
model-analyzer analyze --config-file /data/analyze.yaml

Here are the measurement results

Models (Inference):
Model     Concurrency   Model Config Path   Instance Group   Throughput (infer/sec)   p99 Latency (ms)   Response Wait Time (ms)   Server Queue time (ms)   Server Compute Infer time (ms)  
add_sub   32            add_sub_i3          4/CPU            11325.0                  5.3                2.8                       1.9                      0.3                             
add_sub   32            add_sub_i2          3/CPU            10517.0                  4.0                3.0                       2.4                      0.2                             
add_sub   32            add_sub_i1          2/CPU            8504.0                   5.1                3.7                       3.3                      0.2                             
add_sub   32            add_sub_i0          1/CPU            5049.0                   7.2                6.3                       6.0                      0.1               

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well). I have used add_sub example in python_backend repo to test this behavior

Expected behavior I expect near linear increase of throughput and almost constant compute infer time as the number of CPU instances of a model increases.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:13 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
Tabriziancommented, Jan 13, 2022

but does it also increase the CPU usage when the server is not idle? Also, the model is only doing a simple add/sub operation. Spending 180% CPU seems a lot. Is that normal?

The idle CPU usage is related to the issue that you’ve linked to and it has been fixed. The CPU usage for non-idle case I think is expected. You are measuring the performance under load and it can lead to significant CPU usage.

Also, the throughput increase can be non-linear because of the shared resources in use. For example, having more instances may lead to more cache misses and thus smaller performance gains.

1reaction
Tabriziancommented, Jan 31, 2022

Great. Thanks for letting us know. I’ll close this ticket as the original problem is resolved.

Read more comments on GitHub >

github_iconTop Results From Across the Web

python_backend consuming too much CPU without any ...
While running tritonserver within docker, the triton_python_backend_stub process is using about 40% CPU when there is no incoming requests.
Read more >
Estimating CPU Performance using Amdahls Law
However, since we are primarily concerned with the maximum speedup that can be achieved by increasing the number of CPU cores, this equation ......
Read more >
Why does speed up not increase linearly as the number of ...
If run-time's 2% belongs to serial execution, then even if you have infinite gflops you can't surpass 50x speedup. If parallelized codes perfectly...
Read more >
CPU utilization of multi-threaded architectures explained
In such environments CPU utilization grows linearly with increased workload. Multi-core CPUs: 1 processor = 2 or more cores. In multi-core CPUs, ...
Read more >
Optimize CPU options - Amazon Elastic Compute Cloud
Specify CPU options for your instance by specifying the number of CPU cores and disabling multithreading.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found