question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Parallel model inferencing flakey after upgrading triton

See original GitHub issue

Description I am upgrading triton from version 21.05 to 22.06 and it seems that parallel model inferencing is now flakey. Failures like this seem to randomly occur:

INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2138112 for input 'images', expecting 2064384

My inferencing workload was previously working, but seems to break with the latest version of triton. It’s also confusing that the message states a certain size is expected even though the model config has flexible dims

Triton Information What version of Triton are you using? 22.06

Are you using the Triton container or did you build it yourself? Triton container with:

  • some additional pip dependencies installed
  • built python backend for the args[‘model_repository’] fix from a separate issue.

To Reproduce I’m running the following script:

import logging
from concurrent import futures

from tritonclient.grpc import InferenceServerClient, InferInput, InferRequestedOutput
import numpy as np

logging.getLogger().setLevel(logging.INFO)

triton_client = InferenceServerClient('127.0.0.1:8001')

def run():
    for i in range(10):
        print(i)
        try:
            inputs = []
            input_array = np.random.randint(0, 255, (8, 256, 384, 3), dtype=np.uint8)
            model_input = InferInput('images', input_array.shape, 'UINT8')
            model_input.set_data_from_numpy(input_array)
            res = triton_client.infer(model_name='my_model', inputs=[model_input], outputs=[InferRequestedOutput('softmax')])

            inputs = []
            input_array = np.random.randint(0, 255, (6, 256, 336, 3), dtype=np.uint8)
            model_input = InferInput('images', input_array.shape, 'UINT8')
            model_input.set_data_from_numpy(input_array)
            res = triton_client.infer(model_name='my_model', inputs=[model_input], outputs=[InferRequestedOutput('softmax')])

            inputs = []
            input_array = np.random.randint(0, 255,(2, 384, 256, 3), dtype=np.uint8)
            model_input = InferInput('images', input_array.shape, 'UINT8')
            model_input.set_data_from_numpy(input_array)
            res = triton_client.infer(model_name='my_model', inputs=[model_input], outputs=[InferRequestedOutput('softmax')])

            inputs = []
            input_array = np.random.randint(0, 255,(6, 256, 256, 3), dtype=np.uint8)
            model_input = InferInput('images', input_array.shape, 'UINT8')
            model_input.set_data_from_numpy(input_array)
            res = triton_client.infer(model_name='my_model', inputs=[model_input], outputs=[InferRequestedOutput('softmax')])
        except Exception as e:
            logging.info(e)

with futures.ThreadPoolExecutor(4) as pool:
    futures = []
    for i in range(4):
        futures.append(pool.submit(run))
    for f in futures:
        f.result()

and seeing this output:

0
0
0
0
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2138112 for input 'images', expecting 2064384
1
1
1
1
2
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2138112 for input 'images', expecting 2064384
2
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2727936 for input 'images', expecting 2359296
3
2
2
3
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2138112 for input 'images', expecting 2064384
4
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2727936 for input 'images', expecting 2359296
5
3
3
4
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2138112 for input 'images', expecting 2064384
6
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2727936 for input 'images', expecting 2359296
7
4
4
5
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2138112 for input 'images', expecting 2064384
8
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2727936 for input 'images', expecting 2359296
6
5
5
9
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2138112 for input 'images', expecting 2064384
7
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2727936 for input 'images', expecting 2359296
6
6
8
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2138112 for input 'images', expecting 2064384
7
7
9
8
8
9
9

though it changes every run. When I use less than 4 threads I don’t see the issue, but when using 4 or more threads it will fail at least once.

My model config:

name: "my_model"
platform: "tensorflow_savedmodel"
max_batch_size: 16
input {
  name: "images"
  data_type: TYPE_UINT8
  format: FORMAT_NHWC
  dims: -1
  dims: -1
  dims: 3
}
output {
  name: "features"
  data_type: TYPE_FP32
  dims: 1
  dims: 1
  dims: 1024
}
output {
  name: "softmax"
  data_type: TYPE_FP32
  dims: 1
  dims: 1
  dims: 11043
  label_filename: "labels.txt"
}
dynamic_batching {
  preferred_batch_size: 4
  max_queue_delay_microseconds: 500
}

Expected behavior I expect the inference to work on every request and not fail, as it previously did.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:12 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
GuanLuocommented, Aug 17, 2022
0reactions
krishung5commented, Oct 13, 2022

Closing this as the issue has been fixed.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Simplifying AI Inference in Production with NVIDIA Triton
In this blog post, learn how Triton helps with a standardized scalable production AI in every data center, cloud, and embedded device.
Read more >
MLOps Toys | A Curated List of Machine Learning Projects
Triton Inference Server simplifies the deployment of AI models at scale in production. Supports TensorFlow, TensorRT, PyTorch, ONNX Runtime, and custom ...
Read more >
Serve multiple models with Amazon SageMaker and Triton ...
In this post, we discuss how SageMaker and NVIDIA Triton Inference Server can solve this problem. Solution overview. Let's look at how SageMaker ......
Read more >
Untitled - Springer
ject allows us to perform a strong update on <S, unique, init> when con- ... [29] RODRIGUEZ, J. D. A graph model for...
Read more >
National Science Bee Part 1 Flashcards | Quizlet
Triton. Methane in this planet's atmosphere reflects gives it its namesake blue ... J.J. Thomson proposed the plum pudding model of this structure...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found