question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Concurrent requests to multiple models cause NaN values in output

See original GitHub issue

Description I use Triton to host two TRT models: an object detector and a feature extractor. When both models are called to perform inference simultaneously (using the python API: tritonclient.grpc.InferenceServerClient.infer(...)) the feature extractor returns a numpy array containing NaN values.

This does not happen with multiple concurrent requests to any single model.

Triton Information docker image: nvcr.io/nvidia/tritonserver:20.12-py3 server_version: 2.6.0 tritonclient==2.3.0

To Reproduce

  1. load triton with two models: osnet_x0_25_dyn and yolov4_32

I’m using docker-compose:

version: '3.7'

services:

  triton:
    hostname: triton
    image: nvcr.io/nvidia/tritonserver:20.12-py3
    command: tritonserver --model-repository=/models --strict-model-config=false --grpc-infer-allocation-pool-size=4 --pinned-memory-pool-byte-size=64000000 --log-verbose 0
    ports:
      - '8001:8001'
    volumes:
      - '/home/julien/modelrepo/models:/models'
      - '/home/julien/modelrepo/plugins:/plugins'
    environment:
      - 'LD_PRELOAD=/plugins/liblayerplugin.so'
    ulimits:
      stack:
        soft: 67108864
        hard: 67108864
    shm_size: '1gb'
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: [ 1 ]
              capabilities: [ gpu ]
  1. run both scripts below at the same time (in two different terminal windows).

feature_extractor.py:

import numpy as np
import tritonclient.grpc as grpcclient
from multiprocessing import Pool

def send_feature_extractor_request(i):

    buffer = np.random.rand(16, 3, 256, 128).astype(np.float32)
    triton_client = grpcclient.InferenceServerClient(url='localhost:8001')
    res = []
    inputs = [grpcclient.InferInput('input', buffer.shape, 'FP32')]
    outputs = [grpcclient.InferRequestedOutput('output')]
    inputs[0].set_data_from_numpy(buffer)
    for _ in range(100):
        result = triton_client.infer(
            'osnet_x0_25_dyn', inputs=inputs, outputs=outputs)
        output = result.as_numpy('output')
        res.append(np.isnan(np.sum(output)))

    return {i: any(res)}

if __name__ == '__main__':
    N = 2
    with Pool(N) as p:
        print(p.map(send_feature_extractor_request, np.arange(0, N).tolist()))

and detector.py:

import numpy as np
import tritonclient.grpc as grpcclient
from multiprocessing import Pool

def send_detector_request(i):

    buffer = np.random.rand(8, 3, 512, 512).astype(np.float32)
    triton_client = grpcclient.InferenceServerClient(url='localhost:8001')
    res = []
    inputs = [grpcclient.InferInput('data', buffer.shape, 'FP32')]
    outputs = [grpcclient.InferRequestedOutput('prob')]
    inputs[0].set_data_from_numpy(buffer)
    for _ in range(100):
        result = triton_client.infer(
            'yolov4_32', inputs=inputs, outputs=outputs)
        output = result.as_numpy('prob')
        res.append(np.isnan(np.sum(output)))

    return {i: any(res)}

if __name__ == '__main__':
    N = 2
    with Pool(N) as p:
        print(p.map(send_detector_request, np.arange(0, N).tolist()))

You will see that the feature_extractor.py script outputs: [{0: True}, {1: True}]. This means both subprocesses have encountered NaN values in their responses.

Expected behavior

I expect both models to return correct values, even when multiple clients send inference requests simultaneously. There should never be a NaN in output, which means the scripts should return [{0: False}, {1: False}].

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:16 (9 by maintainers)

github_iconTop GitHub Comments

2reactions
julienschuermanscommented, Jul 2, 2021

Any updates on this? Other things we’ve tried:

  • Limit dynamic batching to 1 for both models
  • Limit instance group count to 1 for both models
  • Limit batch size to 1 for both models
  • Limit amount of sequential requests
  • Set FP16 precision via model config for yolov4 model

The problem still occurs, although it happens less often with fewer sequential requests (that still overlap).

Related to #2339 ?

2reactions
Tabriziancommented, Jun 25, 2021

Thanks for sharing the models. I have filed a bug against the dev team to further investigate this. They may follow up with you for additional information.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Concurrent requests in Django - Stack Overflow
Take a look at django-concurrency. It handles concurrent editing using optimistic concurrency control pattern.
Read more >
Handling Missing Data in ML Modelling (with Python) - Cardo AI
Handling missing data in Machine Learning Modelling with Python is hard. This is why we have prepared this guide to help you deal...
Read more >
Invoke a Multi-Model Endpoint - Amazon SageMaker
The SageMaker Runtime InvokeEndpoint request supports X-Amzn-SageMaker-Target-Model as a new header that takes the relative path of the model specified for ...
Read more >
How (Not) to Tune Your Model With Hyperopt - Databricks
Observe the results in an MLflow parallel coordinate plot and select the runs with lowest loss; Move the range towards those higher/lower values...
Read more >
How to Develop Convolutional Neural Network Models for ...
A 1D CNN model needs sufficient context to learn a mapping from an input sequence to an output value. CNNs can support parallel...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found