perf_client fails with "Received message larger than max"
See original GitHub issueDescription
Running perf_client
gives the following error
...
Request concurrency: 11
Failed to maintain requested inference load. Worker thread(s) failed to generate concurrent requests.
Thread [1] had error: GRPC Execute Failed, message: Received message larger than max (4817136 vs. 4194304)
Thread [2] had error: GRPC Execute Failed, message: Received message larger than max (4817136 vs. 4194304)
Thread [4] had error: GRPC Execute Failed, message: Received message larger than max (4817136 vs. 4194304)
Thread [5] had error: GRPC Execute Failed, message: Received message larger than max (4817136 vs. 4194304)
Thread [7] had error: GRPC Execute Failed, message: Received message larger than max (4817136 vs. 4194304)
Thread [8] had error: GRPC Execute Failed, message: Received message larger than max (4817136 vs. 4194304)
Thread [9] had error: GRPC Execute Failed, message: Received message larger than max (4817136 vs. 4194304)
Thread [10] had error: GRPC Execute Failed, message: Received message larger than max (4817136 vs. 4194304)
The input data is a 320x180 jpeg image that is base64 encoded and included in a json file. The model uses the Python backend to convert binary image data to a tensor.
A similar error appears in issue https://github.com/triton-inference-server/server/issues/1776, but that issue is about the Python client. I suppose the perf_client is using the C++ client, where the maximum message size should be set to INT32_MAX = 2147483647
, not 4194304
as reported by the error.
Triton Information What version of Triton are you using? 20.09
Are you using the Triton container or did you build it yourself? I built them myself:
- The Triton server image is based on
nvcr.io/nvidia/tritonserver:20.09-py3
with additional Python dependencies for the models using the Python backend - The perf_client image is based on
nvidia/cuda:11.0-devel-ubuntu18.04
, where the clients are installed with
python -m pip install nvidia-pyindex
python -m pip install tritonclient
To Reproduce
Command run:
perf_client -m pre-resnet-imagenet \
--sync \
--measurement-interval 5000 \
--concurrency-range 1:60:2 \
--input-data data_180p.json \
-u ${SERVER_URL}:${SERVER_PORT} \
-i ${PROTOCOL} \
-b 1 \
-f ${OUTPUT_CSV}
with
pre-resnet-imagenet
: a model using the Python backend to convert binary image data to a tensor$PROTOCOL
either GRPC or HTTP (the error appears for both)data_180p.json
: the json file containing the 320x180p image, base64-encoded according to these instructions. You can find the json file here.
The model configuration is
name: "pre-resnet-imagenet"
backend: "python"
max_batch_size: 1024
input [
{
name: "input"
data_type: TYPE_STRING
dims: [ 1 ]
}
]
output [
{
name: "output"
data_type: TYPE_FP32
dims: [ 3, 224, 224 ]
}
]
instance_group [
{
count: 1
kind: KIND_CPU
}
]
dynamic_batching { }
version_policy: { all { }}
Expected behavior perf_client to run successfully.
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (5 by maintainers)
Top GitHub Comments
Closing this issue. Feel free to reopen it if you are still facing this.
No, you can use both GRPC and HTTP with the most recent and old Python backend versions.
The GRPC used in the very old Python backend was used to communicate between the Triton-core and user’s Python model. The client can use either GRPC/HTTP for communication.