Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

perf_client fails with "Received message larger than max"

See original GitHub issue

Description Running perf_client gives the following error

...

Request concurrency: 11
Failed to maintain requested inference load. Worker thread(s) failed to generate concurrent requests.
Thread [1] had error: GRPC Execute Failed, message: Received message larger than max (4817136 vs. 4194304)
Thread [2] had error: GRPC Execute Failed, message: Received message larger than max (4817136 vs. 4194304)
Thread [4] had error: GRPC Execute Failed, message: Received message larger than max (4817136 vs. 4194304)
Thread [5] had error: GRPC Execute Failed, message: Received message larger than max (4817136 vs. 4194304)
Thread [7] had error: GRPC Execute Failed, message: Received message larger than max (4817136 vs. 4194304)
Thread [8] had error: GRPC Execute Failed, message: Received message larger than max (4817136 vs. 4194304)
Thread [9] had error: GRPC Execute Failed, message: Received message larger than max (4817136 vs. 4194304)
Thread [10] had error: GRPC Execute Failed, message: Received message larger than max (4817136 vs. 4194304)

The input data is a 320x180 jpeg image that is base64 encoded and included in a json file. The model uses the Python backend to convert binary image data to a tensor.

A similar error appears in issue https://github.com/triton-inference-server/server/issues/1776, but that issue is about the Python client. I suppose the perf_client is using the C++ client, where the maximum message size should be set to INT32_MAX = 2147483647, not 4194304 as reported by the error.

Triton Information What version of Triton are you using? 20.09

Are you using the Triton container or did you build it yourself? I built them myself:

The Triton server image is based on nvcr.io/nvidia/tritonserver:20.09-py3 with additional Python dependencies for the models using the Python backend
The perf_client image is based on nvidia/cuda:11.0-devel-ubuntu18.04, where the clients are installed with

python -m pip install nvidia-pyindex
python -m pip install tritonclient

To Reproduce

Command run:

perf_client -m pre-resnet-imagenet \
        --sync \
        --measurement-interval 5000 \
        --concurrency-range 1:60:2 \
        --input-data data_180p.json \
        -u ${SERVER_URL}:${SERVER_PORT} \
        -i ${PROTOCOL} \
        -b 1 \
        -f ${OUTPUT_CSV}

with

pre-resnet-imagenet: a model using the Python backend to convert binary image data to a tensor
$PROTOCOL either GRPC or HTTP (the error appears for both)
data_180p.json: the json file containing the 320x180p image, base64-encoded according to these instructions. You can find the json file here.

The model configuration is

name: "pre-resnet-imagenet"
backend: "python"
max_batch_size: 1024
input [
  {
    name: "input"
    data_type: TYPE_STRING
    dims: [ 1 ]
  }
]
output [
  {
    name: "output"
    data_type: TYPE_FP32
    dims: [ 3, 224, 224 ]
  }
]
instance_group [
  {
    count: 1
    kind: KIND_CPU
  }
]
dynamic_batching { }
version_policy: { all { }}

Expected behavior perf_client to run successfully.

Issue Analytics

State:
Created 3 years ago
Comments:8 (5 by maintainers)

Top GitHub Comments

1reaction

Tabriziancommented, Nov 11, 2020

Closing this issue. Feel free to reopen it if you are still facing this.

0reactions

Tabriziancommented, Oct 14, 2022

No, you can use both GRPC and HTTP with the most recent and old Python backend versions.

The GRPC used in the very old Python backend was used to communicate between the Triton-core and user’s Python model. The client can use either GRPC/HTTP for communication.