question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Onnxruntime execute failure

See original GitHub issue

Previously, I used the 21.03 container for serving my onnx model (exported Nvidia Citrinet ASR model). Everything worked fine. Right now I need to use version 21.07+, but I get this creepy error:

onnxruntime execute failure 1: Non-zero status code returned while running FusedConv node. Name:'Conv_35_Add_36_Relu_37' Status Message: CUDNN error executing cudnnAddTensor(Base::CudnnHandle(), &alpha, Base::s_.z_tensor, Base::s_.z_data, &alpha, Base::s_.y_tensor, Base::s_.y_data)

Just in case, I reconverted Citrinet to onnx with the latest onnxruntime version (1.9.0, but also tried 1.8.1) and checked everything again, nothing changed.

I checked some containers: 21.03-py3 - everything works fine, I can get server responses with correct inference output data. 21.06-py3, 21.07-py3, 21.08-py3 - getting error (pasted it above).

CUDA: 11.4, Driver Version: 470.57.02

What can you advise me to run inference in new versions of the container correctly?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:2
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

4reactions
hyperfraisecommented, Jun 7, 2022

This “workaround” slashes the performance. I feel like this should be reopened.

2reactions
ValeryNikiforovcommented, Oct 1, 2021

@CoderHam I found a workaround, everything works fine with optimization {graph : {level : -1}} in config.pbtxt (with no fusions optimization). I got this idea from here. And, of course, everything is ok when I use CPU for inference instead of GPU.

Also, according to Onnxruntime releases, v.1.8.2 has “fixed a crash issue when optimizing Conv->Add->Relu”. However, building the ONNX Runtime backend with v1.8.2+ for Triton didn’t help me. Maybe this information will be useful to you in some way.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Common errors with onnxruntime
The first example fails due to bad types. onnxruntime only expects single floats (4 ... The model fails to return an output if...
Read more >
Failed to create CUDAExecutionProvider - Optimum
Although get_available_providers() shows CUDAExecutionProvider available, ONNX Runtime can fail to find CUDA dependencies when initializing the ...
Read more >
Errors with onnxruntime — sklearn-onnx 1.11 documentation
The model fails to return an output if the name is misspelled. try: x = ; The output name is optional, it can...
Read more >
Failed to create TensorrtExecutionProvider using onnxruntime ...
I am having trouble using TensorRT execution provider for onnxruntime-gpu inferencing. I am initializing the session like this:
Read more >
Errors with onnxruntime — sklearn-onnx 1.13 documentation
import skl2onnx import onnx import sklearn import onnxruntime as rt import numpy as np ... The model fails to return an output if...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found