Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RNN in ONNX model does not give correct output with batch_size > 1

See original GitHub issue

Description I have a simple pytorch bidirectional gru model that I exported to ONNX. It exports successfully. When I load the model into python using onnxruntime-0.5.0 I get the correct result (result matches pytorch output). However, when I put the model in tensorrt-inference-server I get a incorrect result for batch size > 1 (does not match pytorch or onnxruntime output).

TRTIS Information What version of TRTIS are you using? 19.10 Are you using the TRTIS container or did you build it yourself? using TRTIS container To Reproduce Steps to reproduce the behavior:

Simple GRU model

name: "simple_rnn"
platform: "onnxruntime_onnx"
max_batch_size : 2000
dynamic_batching {
  preferred_batch_size: [ 100, 500 ]
  max_queue_delay_microseconds: 50
}
input [
  {
    name: "input"
    data_type: TYPE_FP32
    dims: [5]
  }
]
output [
  {
    name: "out"
    data_type: TYPE_FP32
    dims: [ 200 ]
  } 
]

Python Inference Code

eval_batch0 = np.array([[ 1.5410, -0.2934, -2.1788,  0.5684, -1.0845]],dtype=np.float32)
eval_batch1 = np.array([[ 1.2410, -0.5934, -1.1788,  0.2684, -1.3845]],dtype=np.float32)
batch_size=2
batch_res = batch_ctx.run(
            { "input" : [eval_batch0, eval_batch1] },
            { "out" : (InferContext.ResultFormat.RAW) },
            batch_size)


    output0_data = result['out'][0]
    print(output0_data)

Expected behavior Should get: [[ 0.05313804 -0.03783077 -0.01000221 -0.06483013 0.01122947 -0.11574925 -0.1188059 0.00879205 0 . . . -0.01771809 0.03329424 -0.06625719 -0.19090976 -0.16422167 -0.13947284 -0.02236746 -0.05213657 0.0291802 -0.01893364]]

But instead I get: [[-0.12171305 0.08831704 -0.1299032 -0.04495994 -0.09434152 -0.00348916 -0.06000247 0.06794671 -0.00264789 . . . -0.03702144 -0.00485747 0.06768323 0.10949556 0.09059966 -0.00926312 0.01789687 -0.03266058 -0.12896161 0.02893848]]

Issue Analytics

State:
Created 4 years ago
Comments:17 (9 by maintainers)

Top GitHub Comments

1reaction

GuanLuocommented, Nov 1, 2019

@daquilnp I couldn’t reproduce your issue… See attachments for scripts I used to generate PyTorch model (and evaluate expected result), evaluate in ORT directly, evaluate in TRTIS. Basically all results from different means are the same, but I only tested all with batch size 1 as the description suggest that the issue happens even for batch size 1.

model_repo.zip scripts_and_result.zip

1reaction

CoderHamcommented, Nov 1, 2019

No I was suggesting that the export to ONNX format may be broken due to some Pytorch regression and this may be the reason such behavior is observed. However I see that this may not be the problem as I note that the converted onnx model works in python successfully outside TRTIS.

Top Results From Across the Web

RNN in ONNX model does not give correct output ... - GitHub

If I export the onnx model with a batch size of 1 and do not allow different batch sizes, the tenssorrt-inference-server gives the...

RNN not training when batch size > 1 with variable length data

I'm implementing a simple RNN network which predicts 1/0 for some variable length time-series data. The network would first feed the training ...

(optional) Exporting a Model from PyTorch to ONNX and ...

In this tutorial, we describe how to convert a model defined in PyTorch into the ONNX format and then run it with ONNX...

Model Optimizer Frequently Asked Questions

A : Most likely, Model Optimizer does not know how to infer output shapes of some layers in the given topology. To lessen...

Import pretrained ONNX network - MathWorks

This MATLAB function imports a pretrained ONNX (Open Neural Network Exchange) network from the file modelfile.

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

RNN in ONNX model does not give correct output with batch_size > 1

Simple GRU model

Python Inference Code

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Warning: Explicit batch network detected and batch size specified, use enqueue without batch size instead.

Unable to run optimized BERT: model shape expected by framework [-1,-1] doesn't match model configuration shape [-1,-1]