Parallel model inferencing flakey after upgrading triton
See original GitHub issueDescription I am upgrading triton from version 21.05 to 22.06 and it seems that parallel model inferencing is now flakey. Failures like this seem to randomly occur:
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2138112 for input 'images', expecting 2064384
My inferencing workload was previously working, but seems to break with the latest version of triton. It’s also confusing that the message states a certain size is expected even though the model config has flexible dims
Triton Information What version of Triton are you using? 22.06
Are you using the Triton container or did you build it yourself? Triton container with:
- some additional pip dependencies installed
- built python backend for the args[‘model_repository’] fix from a separate issue.
To Reproduce I’m running the following script:
import logging
from concurrent import futures
from tritonclient.grpc import InferenceServerClient, InferInput, InferRequestedOutput
import numpy as np
logging.getLogger().setLevel(logging.INFO)
triton_client = InferenceServerClient('127.0.0.1:8001')
def run():
for i in range(10):
print(i)
try:
inputs = []
input_array = np.random.randint(0, 255, (8, 256, 384, 3), dtype=np.uint8)
model_input = InferInput('images', input_array.shape, 'UINT8')
model_input.set_data_from_numpy(input_array)
res = triton_client.infer(model_name='my_model', inputs=[model_input], outputs=[InferRequestedOutput('softmax')])
inputs = []
input_array = np.random.randint(0, 255, (6, 256, 336, 3), dtype=np.uint8)
model_input = InferInput('images', input_array.shape, 'UINT8')
model_input.set_data_from_numpy(input_array)
res = triton_client.infer(model_name='my_model', inputs=[model_input], outputs=[InferRequestedOutput('softmax')])
inputs = []
input_array = np.random.randint(0, 255,(2, 384, 256, 3), dtype=np.uint8)
model_input = InferInput('images', input_array.shape, 'UINT8')
model_input.set_data_from_numpy(input_array)
res = triton_client.infer(model_name='my_model', inputs=[model_input], outputs=[InferRequestedOutput('softmax')])
inputs = []
input_array = np.random.randint(0, 255,(6, 256, 256, 3), dtype=np.uint8)
model_input = InferInput('images', input_array.shape, 'UINT8')
model_input.set_data_from_numpy(input_array)
res = triton_client.infer(model_name='my_model', inputs=[model_input], outputs=[InferRequestedOutput('softmax')])
except Exception as e:
logging.info(e)
with futures.ThreadPoolExecutor(4) as pool:
futures = []
for i in range(4):
futures.append(pool.submit(run))
for f in futures:
f.result()
and seeing this output:
0
0
0
0
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2138112 for input 'images', expecting 2064384
1
1
1
1
2
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2138112 for input 'images', expecting 2064384
2
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2727936 for input 'images', expecting 2359296
3
2
2
3
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2138112 for input 'images', expecting 2064384
4
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2727936 for input 'images', expecting 2359296
5
3
3
4
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2138112 for input 'images', expecting 2064384
6
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2727936 for input 'images', expecting 2359296
7
4
4
5
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2138112 for input 'images', expecting 2064384
8
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2727936 for input 'images', expecting 2359296
6
5
5
9
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2138112 for input 'images', expecting 2064384
7
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2727936 for input 'images', expecting 2359296
6
6
8
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2138112 for input 'images', expecting 2064384
7
7
9
8
8
9
9
though it changes every run. When I use less than 4 threads I don’t see the issue, but when using 4 or more threads it will fail at least once.
My model config:
name: "my_model"
platform: "tensorflow_savedmodel"
max_batch_size: 16
input {
name: "images"
data_type: TYPE_UINT8
format: FORMAT_NHWC
dims: -1
dims: -1
dims: 3
}
output {
name: "features"
data_type: TYPE_FP32
dims: 1
dims: 1
dims: 1024
}
output {
name: "softmax"
data_type: TYPE_FP32
dims: 1
dims: 1
dims: 11043
label_filename: "labels.txt"
}
dynamic_batching {
preferred_batch_size: 4
max_queue_delay_microseconds: 500
}
Expected behavior I expect the inference to work on every request and not fail, as it previously did.
Issue Analytics
- State:
- Created a year ago
- Comments:12 (6 by maintainers)
Top GitHub Comments
This issue should be fixed by https://github.com/triton-inference-server/core/pull/114
Closing this as the issue has been fixed.