question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Tensorflow models don't seem to batch properly

See original GitHub issue

Description Tensorflow models downloaded from the TFOD model zoo load and work just fine, but dynamic batching doesn’t seem to work. TF2 models report “model signature does not support batching”. TF1 models load with dynamic_batching enabled, but latency scales linearly with concurrency > 1

Triton Information What version of Triton are you using? 20.09

Are you using the Triton container or did you build it yourself? Container

To Reproduce Download TF1 resnet50 faster RCNN from here Download TF2 resnet101 faster RCNN from here Load the models using --strict-model-config false Provide a minimal config.pbtxt enabling dynamic batching as below:

platform: "tensorflow_savedmodel"
max_batch_size: 2
dynamic_batching { }

Use perf_client to evaluate the model over concurrency from 1 to 8 as below: perf_client -m fasterrcnn101v1640x640 --percentile=95 --shape input_tensor:1,640,640,3 -i gRPC --concurrency-range 1:8

With --log-verbose=1, the TF1 model shows the following:

    "name": "fasterrcnn50_coco_2018_01_28",
    "platform": "tensorflow_savedmodel",
    "backend": "tensorflow",
    "version_policy": {
        "latest": {
            "num_versions": 1
        }
    },
    "max_batch_size": 1,
    "input": [
        {
            "name": "inputs",
            "data_type": "TYPE_UINT8",
            "dims": [
                -1,
                -1,
                3
            ]
        }
    ],
    "output": [
        {
            "name": "detection_scores",
            "data_type": "TYPE_FP32",
            "dims": [
                100
            ]
        },
        {
            "name": "detection_boxes",
            "data_type": "TYPE_FP32",
            "dims": [
                100,
                4
            ]
        },
        {
            "name": "num_detections",
            "data_type": "TYPE_FP32",
            "reshape": {
                "shape": []
            },
            "dims": [
                1
            ]
        },
        {
            "name": "detection_classes",
            "data_type": "TYPE_FP32",
            "dims": [
                100
            ]
        }
    ],
    "batch_input": [],
    "batch_output": [],
    "optimization": {
        "priority": "PRIORITY_DEFAULT",
        "input_pinned_memory": {
            "enable": true
        },
        "output_pinned_memory": {
            "enable": true
        }
    },
    "instance_group": [
        {
            "name": "fasterrcnn50_coco_2018_01_28",
            "kind": "KIND_GPU",
            "count": 1,
            "gpus": [
                0
            ],
            "profile": []
        }
    ],
    "default_model_filename": "model.savedmodel",
    "cc_model_filenames": {},
    "metric_tags": {},
    "parameters": {},
    "model_warmup": []
}

The TF2 model shows the following:

    "name": "fasterrcnn101v1640x640",
    "platform": "tensorflow_savedmodel",
    "backend": "tensorflow",
    "version_policy": {
        "latest": {
            "num_versions": 1
        }
    },
    "max_batch_size": 0,
    "input": [
        {
            "name": "input_tensor",
            "data_type": "TYPE_UINT8",
            "dims": [
                1,
                -1,
                -1,
                3
            ]
        }
    ],
    "output": [
        {
            "name": "detection_scores",
            "data_type": "TYPE_FP32",
            "dims": [
                1,
                300
            ]
        },
        {
            "name": "raw_detection_boxes",
            "data_type": "TYPE_FP32",
            "dims": [
                1,
                300,
                4
            ]
        },
        {
            "name": "detection_boxes",
            "data_type": "TYPE_FP32",
            "dims": [
                1,
                300,
                4
            ]
        },
        {
            "name": "num_detections",
            "data_type": "TYPE_FP32",
            "dims": [
                1
            ]
        },
        {
            "name": "detection_classes",
            "data_type": "TYPE_FP32",
            "dims": [
                1,
                300
            ]
        },
        {
            "name": "detection_multiclass_scores",
            "data_type": "TYPE_FP32",
            "dims": [
                1,
                300,
                91
            ]
        },
        {
            "name": "detection_anchor_indices",
            "data_type": "TYPE_FP32",
            "dims": [
                1,
                300
            ]
        },
        {
            "name": "raw_detection_scores",
            "data_type": "TYPE_FP32",
            "dims": [
                1,
                300,
                91
            ]
        }
    ],
    "batch_input": [],
    "batch_output": [],
    "optimization": {
        "priority": "PRIORITY_DEFAULT",
        "input_pinned_memory": {
            "enable": true
        },
        "output_pinned_memory": {
            "enable": true
        }
    },
    "instance_group": [
        {
            "name": "fasterrcnn101v1640x640",
            "kind": "KIND_GPU",
            "count": 1,
            "gpus": [
                0
            ],
            "profile": []
        }
    ],
    "default_model_filename": "model.savedmodel",
    "cc_model_filenames": {},
    "metric_tags": {},
    "parameters": {},
    "model_warmup": []
}

And fails with the following message:

E1007 05:32:54.981619 1 model_repository_manager.cc:899] failed to load 'fasterrcnn101v1640x640' version 1: Internal: unable to autofill for 'fasterrcnn101v1640x640', configuration specified max-batch 2 but model signature does not support batching

Expected behavior Expect throughput to increase with increase concurrency. Instead throughput remains constant and latency scales linearly with concurrency. See summary results of perf_client below:

Concurrency: 1, throughput: 18.3 infer/sec, latency 62018 usec
Concurrency: 2, throughput: 18.5 infer/sec, latency 127477 usec
Concurrency: 3, throughput: 18.4 infer/sec, latency 190984 usec
Concurrency: 4, throughput: 18.6 infer/sec, latency 236294 usec
Concurrency: 5, throughput: 18.6 infer/sec, latency 292782 usec
Concurrency: 6, throughput: 17.8 infer/sec, latency 362044 usec
Concurrency: 7, throughput: 18.1 infer/sec, latency 426605 usec
Concurrency: 8, throughput: 18.2 infer/sec, latency 469077 usec

For comparison, an ONNX Yolov4 model gets the following results after optimization and dynamic batching enabled:

Concurrency: 1, throughput: 64.5 infer/sec, latency 17382 usec
Concurrency: 2, throughput: 82.3 infer/sec, latency 29741 usec
Concurrency: 3, throughput: 86.3 infer/sec, latency 45026 usec
Concurrency: 4, throughput: 86 infer/sec, latency 65735 usec
Concurrency: 5, throughput: 101.8 infer/sec, latency 70180 usec
Concurrency: 6, throughput: 115.9 infer/sec, latency 73805 usec
Concurrency: 7, throughput: 128.5 infer/sec, latency 78493 usec
Concurrency: 8, throughput: 140.3 infer/sec, latency 82236 usec

What do I need to do to enable batching with TF models? Do I need to export a saved model with a new input shape (-1, -1, -1, 3) rather than (1, -1, -1, 3)?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
deadeyegoodwincommented, Oct 8, 2020

Triton is not batching for the ONNX model. As you note it does not support batching. Perhaps you think it was batching because increasing the perf_analyzer concurrency resulted in increased throughput. That doesn’t necessarily require dynamic batching. Having 8 inference requests in flight at all times (concurrency 8) means that any network delays or other latencies can be hidden. Why doesn’t the TF model have scaling with increased concurrency? Perhaps the bottleneck for that is the model execution itself, so having more requests in flight does not actually help (although there is usually at least a small improvement going from concurrency 1 to 2). It doesn’t directly address your question by make sure you read https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/optimization.html

1reaction
deadeyegoodwincommented, Oct 8, 2020

In both cases you have models that don’t support batching. See https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/model_configuration.html#inputs-and-outputs.

The model needs to have a variable-sized (-1) dimension for all inputs and outputs for Triton to be able to dynamically batch and the max_batch_size must be > 1. You need to file a ticket against the model zoo to find out why they are not producing models that can support batching.

Read more comments on GitHub >

github_iconTop Results From Across the Web

python - Tensorflow - batching issues - Stack Overflow
I'm quite new to tensorflow, and I'm trying to train from my csv files using batch. Here's my code for read csv file...
Read more >
Better performance with tf.function | TensorFlow Core
This error occurs because Keras models (which do not have their input shape defined) and Keras layers create tf.Variables s when they are...
Read more >
Machine Learning Glossary - Google Developers
Modern ML APIs like TensorFlow now implement backpropagation for you. Phew! ... If you don't add an embedding layer to the model, ...
Read more >
Troubleshooting TensorFlow - TPU - Google Cloud
The batch size of any model should always be at least 64 (8 per TPU core), since the TPU always pads the tensors...
Read more >
Debugging a Machine Learning model written in TensorFlow ...
At this point, I have a tensor that is [?, 64, 64, 2], i.e. a batch of 2-channel images. The rest of the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found