Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Server returns broken json requests when using TensorRT model config

See original GitHub issue

Description I have the following model config

name: "ner"
platform: "onnxruntime_onnx"
max_batch_size: 128
input [
  {
    name: "input_ids"
    data_type: TYPE_INT64
    dims: [ -1 ]
  },
  {
    name: "token_type_ids"
    data_type: TYPE_INT64
    dims: [ -1 ]
  },
  {
    name: "attention_mask"
    data_type: TYPE_INT64
    dims: [ -1 ]
  }
]
output [
  {
    name: "start_logits"
    data_type: TYPE_FP32
    dims: [ -1, 5 ]
  },
  {
    name: "end_logits"
    data_type: TYPE_FP32
    dims: [ -1, 5 ]
  }
]

optimization { execution_accelerators {
  gpu_execution_accelerator : [ {
    name : "tensorrt"
    parameters { key: "precision_mode" value: "FP16" }
    parameters { key: "max_workspace_size_bytes" value: "4073741824" }
    }]
}}

It loads fine. When I make requests to it, it returns me a payload broken in the middle such as

{
    "id": "42",
    "model_name": "ner",
    "model_version": "1",
    "outputs": [
        {
            "name": "end_logits",
            "datatype": "FP32",
            "shape": [
                2,
                8,
                5
            ],
            "data": [

Triton Information What version of Triton are you using?

Using triton image nvcr.io/nvidia/tritonserver:21.11-py3

To Reproduce Create a model repository with that config. Make a post request to http://localhost:8000/v2/models/ner/infer such as

curl -d '{
    "id" : "42",
    "inputs": [
        {
            "name": "input_ids",
            "shape": [2, 8],
            "datatype":"INT64",
            "data":  [[1,2,3,4,5,6,7,8],[1,2,3,4,5,6,7,8]]

        },
        {
            "name": "token_type_ids",
            "shape": [2,8],
            "datatype":"INT64",
            "data":  [[1,2,3,4,5,6,7,8],[1,2,3,4,5,6,7,8]]
        },
            {
            "name": "attention_mask",
            "shape": [2, 8],
            "datatype":"INT64",
            "data":  [[1,2,3,4,5,6,7,8],[1,2,3,4,5,6,7,8]]
        }

   ],
    "outputs" : [
    {
      "name" : "start_logits"
    },
     {
      "name" : "end_logits"
    }
  ]

}'  -H "Content-Type: application/json" -X POST http://localhost:8000/v2/models/ner/infer

Expected behavior The full payload with the correct tensors The model without TensorRT optimization returns the full payload, with the output tensors as expected. Example:

{
   "id":"42",
   "model_name":"ner",
   "model_version":"1",
   "outputs":[
      {
         "name":"end_logits",
         "datatype":"FP32",
         "shape":[
            2,
            8,
            5
         ],
         "data":[
            -9.151423454284668,
            -9.307648658752442,
             ....
            -9.209794044494629,
            -9.336723327636719,
            -9.302064895629883,
            -10.113032341003418,
            -9.484901428222657
         ]
      },
      {
         "name":"start_logits",
         "datatype":"FP32",
         "shape":[
            2,
            8,
            5
         ],
         "data":[
            -8.951006889343262,
            -9.178339004516602,
            ....
            -9.090873718261719,
            -9.383940696716309,
            -9.084102630615235,
            -9.8538236618042,
            -9.339037895202637
         ]
      }
   ]
}

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:9 (6 by maintainers)

Top GitHub Comments

1reaction

CoderHamcommented, Jan 31, 2022

@rmccorm4 can you link the PRs for the fix and close this accordingly

0reactions

andreabrduquecommented, Feb 1, 2022

Thank you all for looking into this. I confirm that model works as expected using onnxruntime. I don’t know why TensorRT converts values to NANs, but I haven’t tried TensorRT on this model besides altering the configuration of the triton server. Should I make an issue instead in the TensorRT repository?