Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Parallel model inferencing flakey after upgrading triton

See original GitHub issue

Description I am upgrading triton from version 21.05 to 22.06 and it seems that parallel model inferencing is now flakey. Failures like this seem to randomly occur:

INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2138112 for input 'images', expecting 2064384

My inferencing workload was previously working, but seems to break with the latest version of triton. It’s also confusing that the message states a certain size is expected even though the model config has flexible dims

Triton Information What version of Triton are you using? 22.06

Are you using the Triton container or did you build it yourself? Triton container with:

some additional pip dependencies installed
built python backend for the args[‘model_repository’] fix from a separate issue.

To Reproduce I’m running the following script:

import logging
from concurrent import futures

from tritonclient.grpc import InferenceServerClient, InferInput, InferRequestedOutput
import numpy as np

logging.getLogger().setLevel(logging.INFO)

triton_client = InferenceServerClient('127.0.0.1:8001')

def run():
    for i in range(10):
        print(i)
        try:
            inputs = []
            input_array = np.random.randint(0, 255, (8, 256, 384, 3), dtype=np.uint8)
            model_input = InferInput('images', input_array.shape, 'UINT8')
            model_input.set_data_from_numpy(input_array)
            res = triton_client.infer(model_name='my_model', inputs=[model_input], outputs=[InferRequestedOutput('softmax')])

            inputs = []
            input_array = np.random.randint(0, 255, (6, 256, 336, 3), dtype=np.uint8)
            model_input = InferInput('images', input_array.shape, 'UINT8')
            model_input.set_data_from_numpy(input_array)
            res = triton_client.infer(model_name='my_model', inputs=[model_input], outputs=[InferRequestedOutput('softmax')])

            inputs = []
            input_array = np.random.randint(0, 255,(2, 384, 256, 3), dtype=np.uint8)
            model_input = InferInput('images', input_array.shape, 'UINT8')
            model_input.set_data_from_numpy(input_array)
            res = triton_client.infer(model_name='my_model', inputs=[model_input], outputs=[InferRequestedOutput('softmax')])

            inputs = []
            input_array = np.random.randint(0, 255,(6, 256, 256, 3), dtype=np.uint8)
            model_input = InferInput('images', input_array.shape, 'UINT8')
            model_input.set_data_from_numpy(input_array)
            res = triton_client.infer(model_name='my_model', inputs=[model_input], outputs=[InferRequestedOutput('softmax')])
        except Exception as e:
            logging.info(e)

with futures.ThreadPoolExecutor(4) as pool:
    futures = []
    for i in range(4):
        futures.append(pool.submit(run))
    for f in futures:
        f.result()

and seeing this output:

0
0
0
0
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2138112 for input 'images', expecting 2064384
1
1
1
1
2
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2138112 for input 'images', expecting 2064384
2
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2727936 for input 'images', expecting 2359296
3
2
2
3
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2138112 for input 'images', expecting 2064384
4
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2727936 for input 'images', expecting 2359296
5
3
3
4
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2138112 for input 'images', expecting 2064384
6
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2727936 for input 'images', expecting 2359296
7
4
4
5
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2138112 for input 'images', expecting 2064384
8
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2727936 for input 'images', expecting 2359296
6
5
5
9
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2138112 for input 'images', expecting 2064384
7
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2727936 for input 'images', expecting 2359296
6
6
8
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2138112 for input 'images', expecting 2064384
7
7
9
8
8
9
9

though it changes every run. When I use less than 4 threads I don’t see the issue, but when using 4 or more threads it will fail at least once.

My model config:

name: "my_model"
platform: "tensorflow_savedmodel"
max_batch_size: 16
input {
  name: "images"
  data_type: TYPE_UINT8
  format: FORMAT_NHWC
  dims: -1
  dims: -1
  dims: 3
}
output {
  name: "features"
  data_type: TYPE_FP32
  dims: 1
  dims: 1
  dims: 1024
}
output {
  name: "softmax"
  data_type: TYPE_FP32
  dims: 1
  dims: 1
  dims: 11043
  label_filename: "labels.txt"
}
dynamic_batching {
  preferred_batch_size: 4
  max_queue_delay_microseconds: 500
}

Expected behavior I expect the inference to work on every request and not fail, as it previously did.

Issue Analytics

State:
Created a year ago
Comments:12 (6 by maintainers)

Top GitHub Comments

1reaction

GuanLuocommented, Aug 17, 2022

This issue should be fixed by https://github.com/triton-inference-server/core/pull/114

0reactions

krishung5commented, Oct 13, 2022

Closing this as the issue has been fixed.

Top Results From Across the Web

Simplifying AI Inference in Production with NVIDIA Triton

In this blog post, learn how Triton helps with a standardized scalable production AI in every data center, cloud, and embedded device.

MLOps Toys | A Curated List of Machine Learning Projects

Triton Inference Server simplifies the deployment of AI models at scale in production. Supports TensorFlow, TensorRT, PyTorch, ONNX Runtime, and custom ...

Serve multiple models with Amazon SageMaker and Triton ...

In this post, we discuss how SageMaker and NVIDIA Triton Inference Server can solve this problem. Solution overview. Let's look at how SageMaker ......

Untitled - Springer

ject allows us to perform a strong update on <S, unique, init> when con- ... [29] RODRIGUEZ, J. D. A graph model for...

National Science Bee Part 1 Flashcards | Quizlet

Triton. Methane in this planet's atmosphere reflects gives it its namesake blue ... J.J. Thomson proposed the plum pudding model of this structure...