Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Problems with MTEncDecModel model in ONNX

See original GitHub issue

There are two issues I am having with exporting MTEncDecModel to ONNX, they might be related in some way so going to post them together.

1. ONNX error in embedding in nemo version > 1.2:

If I try to run below example in nemo 1.5 I get the following error:

onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : 
Non-zero status code returned while running Reshape node. Name:'Reshape_67' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/tensor/reshape_helper.h:41
onnxruntime::ReshapeHelper::ReshapeHelper(const onnxruntime::TensorShape&, std::vector<long int>&, bool)
gsl::narrow_cast<int64_t>(input_shape.Size()) == size was false. 
The input tensor cannot be reshaped to the requested shape. Input shape:{1,44,1024}, requested shape:{2,16,16,64}

However, it seems like this error is not present if using an older version of nemo.

2. Big performance discrepancy when using ONNX version of the model:

I notice a big performance drop when exporting the MTEncDecModel to ONNX. I am trying a minimal setup with greedy search and get about half the BLEU score in comparison to using MTEncDecModel directly with TopKSequenceGenerator(top_k=1).

Here is a full script to get the translation with ONNX, you would need to additionally install onnxruntime-gpu:

import torch
import numpy as np
import onnxruntime
from nemo.collections.nlp.models import MTEncDecModel

# Load the model form pretrained:
model = MTEncDecModel.from_pretrained('nmt_en_de_transformer12x2')

# Export all model components to onnx:
model.encoder.export('encoder.onnx')
model.decoder.export('decoder.onnx')
model.log_softmax.export('classifier.onnx')

# Initialise all the onnx sessions:
encoder_session = onnxruntime.InferenceSession('encoder.onnx', providers=['CUDAExecutionProvider'])
decoder_session = onnxruntime.InferenceSession('decoder.onnx', providers=['CUDAExecutionProvider'])
classifier_session = onnxruntime.InferenceSession('classifier.onnx', providers=['CUDAExecutionProvider'])

# Preprocess the data using the original nemo model for simplicity:
TEXT = ['They are not even 100 metres apart: On Tuesday, the new B 33 pedestrian lights in Dorfparkplatz in Gutach became operational - within view of the existing Town Hall traffic lights.']
src_ids, src_mask = model.prepare_inference_batch(TEXT)
src_ids = src_ids.cpu().numpy()  # Convert to numpy for use with onnx
src_mask = src_mask.cpu().numpy().astype(int)

# Compute encoder hidden state:
encoder_input = {'input_ids': src_ids, 'encoder_mask': src_mask}
encoder_hidden_state = encoder_session.run(['last_hidden_states'], encoder_input)[0]

# Simple greedy search:
MAX_GENERATION_DELTA = 5
BOS = model.encoder_tokenizer.bos_id
EOS = model.encoder_tokenizer.eos_id
PAD = model.encoder_tokenizer.pad_id

def decode(tgt: np.array, embeding: np.array, src_mask: np.array) -> np.array:
      decoder_input = {
          'input_ids': tgt,
          'decoder_mask': (tgt != PAD).astype(np.int64),
          'encoder_mask': embeding,
          'encoder_embeddings': src_mask
      }
      decoder_hidden_state = decoder_session.run(['last_hidden_states'], decoder_input)[0]
      log_probs = classifier_session.run(['log_probs'], {'hidden_states': decoder_hidden_state})[0]
      return log_probs

max_out_len = encoder_hidden_state.shape[1] + MAX_GENERATION_DELTA
tgt=np.full(shape=encoder_hidden_state.shape[:-1], fill_value=0)
tgt[:, 0] = BOS

for i in range(1, max_out_len):
    log_probs = decode(tgt[:, :i], encoder_hidden_state, src_mask)
    next_tokens = log_probs[:, -1].argmax(axis=1) # NOTE: ONNX decoder returns multiple outputs which is different to pytorch version, so I get the last one (this could be where error is?)
    tgt[:, i] = next_tokens
    if ((tgt == EOS).sum(axis=1) > 0).all():
        break

tgt_torch = torch.from_numpy(tgt).to('cuda:0')
onnx_translation = model.ids_to_postprocessed_text(tgt_torch, model.decoder_tokenizer, model.target_processor)

I have run above against newstest2014 set and got a BLUE score of 13 vs 29 if I use MTEncDecModel.translate method. It seems like onnx model works well for shorter sentences, but for the longer ones it cuts off too soon. Could be because torch model uses decoder_mems, and onnx doesn’t?

Is there maybe a better way to set up onnx inference?

Environment overview (please complete the following information)

Environment Location: Docker on Kubernetes
Method of NeMo install: using helm chart using nvcr.io/nvidia/nemo:1.2.0 / 1.5.1
Additionally install pip install onnxruntime-gpu==1.10.0

Issue Analytics

State:
Created 2 years ago
Comments:11 (1 by maintainers)

Top GitHub Comments

1reaction

aklife97commented, Jan 29, 2022

Oh I see, @Vlados09 the issue you’re facing is probably linked to a known ONNX export issue with PyTorch.
We recently pushed a workaround to handle it in #3422 Would it be possible for you to install NeMo from source using the main branch? This should most likely fix the error.

Also, we do not recommend using older NeMo versions for exporting NMT models as there were quite a few critical export issues we’ve fixed in recent versions which include the need to provide decoder_mems.

0reactions

savinaycommented, Jul 13, 2022

What is the value of bsize, num_decoder_attention_layers, seqlen, embedding_dim for the first decode iteration?

Top Results From Across the Web

Exporting your model to ONNX format - Unity - Manual

ONNX (Open Neural Network Exchange) is an open format for ML models. ... It is easy to export a Pytorch model to ONNX...

sklearn-onnx: Convert your scikit-learn model into ONNX

sklearn-onnx enables you to convert models from sklearn-learn toolkits into ONNX. ... You should look for existing issues or submit a new one....

ONNX Model: Export Using Pytorch, Problems, and Solutions

As we know that there a lot of AI or ML frameworks available in the market (paid or free version) to develop your...

Problems on inference using GPU with OpenCV dnn and ...

I am trying to inference on a Jetson Xavier with OpenCV dnn. I have converted a YOLOv5m model to .onnx format . Afterwards...

Tune performance | onnxruntime

Tips for tuning performance; Troubleshooting performance issues ... It contains two parts: (1) model conversion to ONNX with correctness checking (2) auto ...