question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Status Message: CUDNN error executing cudnnFindConvolutionForwardAlgorithmEx

See original GitHub issue

Description I am working on a Triton C-API application in combination with ROS1 to do inference with a YOLOv5 custom model on ROS1 image topics. I have a working implementation of the same model with gRPC mode so the model and the config are correct. When I send the normalized image to the triton, it gives me the following error. I tried googling this error but cannot make sense of what exactly is the problem here. Some insights would help a lot.

I0609 12:54:24.923918 21492 libtorch.cc:1381] TRITONBACKEND_Initialize: pytorch
I0609 12:54:24.923949 21492 libtorch.cc:1391] Triton TRITONBACKEND API version: 1.9
I0609 12:54:24.923953 21492 libtorch.cc:1397] 'pytorch' TRITONBACKEND API version: 1.9
2022-06-09 12:54:29.035423: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
I0609 12:54:29.076962 21492 tensorflow.cc:2181] TRITONBACKEND_Initialize: tensorflow
I0609 12:54:29.076984 21492 tensorflow.cc:2191] Triton TRITONBACKEND API version: 1.9
I0609 12:54:29.076989 21492 tensorflow.cc:2197] 'tensorflow' TRITONBACKEND API version: 1.9
I0609 12:54:29.076992 21492 tensorflow.cc:2221] backend configuration:
{}
I0609 12:54:29.186855 21492 onnxruntime.cc:2400] TRITONBACKEND_Initialize: onnxruntime
I0609 12:54:29.186876 21492 onnxruntime.cc:2410] Triton TRITONBACKEND API version: 1.9
I0609 12:54:29.186880 21492 onnxruntime.cc:2416] 'onnxruntime' TRITONBACKEND API version: 1.9
I0609 12:54:29.186884 21492 onnxruntime.cc:2446] backend configuration:
{}
I0609 12:54:29.236464 21492 openvino.cc:1207] TRITONBACKEND_Initialize: openvino
I0609 12:54:29.236483 21492 openvino.cc:1217] Triton TRITONBACKEND API version: 1.9
I0609 12:54:29.236488 21492 openvino.cc:1223] 'openvino' TRITONBACKEND API version: 1.9
I0609 12:54:30.318676 21492 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7fbd36000000' with size 268435456
I0609 12:54:30.319079 21492 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0609 12:54:30.319094 21492 cuda_memory_manager.cc:105] CUDA memory pool is created on device 1 with size 67108864
W0609 12:54:31.182282 21492 server.cc:206] failed to enable peer access for some device pairs
I0609 12:54:31.184446 21492 model_repository_manager.cc:1077] loading: YOLOv5nCOCO:1
I0609 12:54:31.284776 21492 model_repository_manager.cc:1077] loading: YOLOv5nCROP:1
I0609 12:54:31.284788 21492 onnxruntime.cc:2481] TRITONBACKEND_ModelInitialize: YOLOv5nCOCO (version 1)
I0609 12:54:31.285700 21492 onnxruntime.cc:2524] TRITONBACKEND_ModelInstanceInitialize: YOLOv5nCOCO (GPU device 0)
I0609 12:54:31.386580 21492 model_repository_manager.cc:1077] loading: FCOS_detectron:1
I0609 12:54:32.209536 21492 onnxruntime.cc:2481] TRITONBACKEND_ModelInitialize: YOLOv5nCROP (version 1)
I0609 12:54:32.209978 21492 libtorch.cc:1430] TRITONBACKEND_ModelInitialize: FCOS_detectron (version 1)
I0609 12:54:32.210246 21492 libtorch.cc:293] Optimized execution is enabled for model instance 'FCOS_detectron'
I0609 12:54:32.210254 21492 libtorch.cc:311] Inference Mode is disabled for model instance 'FCOS_detectron'
I0609 12:54:32.210258 21492 libtorch.cc:406] NvFuser is not specified for model instance 'FCOS_detectron'
I0609 12:54:32.210272 21492 onnxruntime.cc:2524] TRITONBACKEND_ModelInstanceInitialize: YOLOv5nCOCO (GPU device 1)
I0609 12:54:33.016676 21492 onnxruntime.cc:2524] TRITONBACKEND_ModelInstanceInitialize: YOLOv5nCROP (GPU device 0)
I0609 12:54:33.017054 21492 model_repository_manager.cc:1231] successfully loaded 'YOLOv5nCOCO' version 1
I0609 12:54:33.093639 21492 libtorch.cc:1474] TRITONBACKEND_ModelInstanceInitialize: FCOS_detectron (GPU device 0)
I0609 12:54:33.450545 21492 onnxruntime.cc:2524] TRITONBACKEND_ModelInstanceInitialize: YOLOv5nCROP (GPU device 1)
I0609 12:54:33.503772 21492 libtorch.cc:1474] TRITONBACKEND_ModelInstanceInitialize: FCOS_detectron (GPU device 1)
I0609 12:54:33.504063 21492 model_repository_manager.cc:1231] successfully loaded 'YOLOv5nCROP' version 1
I0609 12:54:33.851533 21492 model_repository_manager.cc:1231] successfully loaded 'FCOS_detectron' version 1
I0609 12:54:33.851605 21492 server.cc:549] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0609 12:54:33.851659 21492 server.cc:576] 
+-------------+-------------------------------------------------------------------------+--------+
| Backend     | Path                                                                    | Config |
+-------------+-------------------------------------------------------------------------+--------+
| pytorch     | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so                 | {}     |
| tensorflow  | /opt/tritonserver/backends/tensorflow1/libtriton_tensorflow1.so         | {}     |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so         | {}     |
| openvino    | /opt/tritonserver/backends/openvino_2021_4/libtriton_openvino_2021_4.so | {}     |
+-------------+-------------------------------------------------------------------------+--------+

I0609 12:54:33.851703 21492 server.cc:619] 
+----------------+---------+--------+
| Model          | Version | Status |
+----------------+---------+--------+
| FCOS_detectron | 1       | READY  |
| YOLOv5nCOCO    | 1       | READY  |
| YOLOv5nCROP    | 1       | READY  |
+----------------+---------+--------+

I0609 12:54:33.892306 21492 metrics.cc:650] Collecting metrics for GPU 0: NVIDIA GeForce RTX 3070
I0609 12:54:33.892335 21492 metrics.cc:650] Collecting metrics for GPU 1: NVIDIA GeForce RTX 3070
I0609 12:54:33.892721 21492 tritonserver.cc:2123] 
+----------------------------------+--------------------------------------------------------------------------+
| Option                           | Value                                                                    |
+----------------------------------+--------------------------------------------------------------------------+
| server_id                        | triton                                                                   |
| server_version                   | 2.21.0                                                                   |
| server_extensions                | classification sequence model_repository model_repository(unload_depende |
|                                  | nts) schedule_policy model_configuration system_shared_memory cuda_share |
|                                  | d_memory binary_tensor_data statistics trace                             |
| model_repository_path[0]         | /opt/model_repo/                                                         |
| model_control_mode               | MODE_POLL                                                                |
| strict_model_config              | 1                                                                        |
| rate_limit                       | OFF                                                                      |
| pinned_memory_pool_byte_size     | 268435456                                                                |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                 |
| cuda_memory_pool_byte_size{1}    | 67108864                                                                 |
| response_cache_byte_size         | 0                                                                        |
| min_supported_compute_capability | 7.5                                                                      |
| strict_readiness                 | 1                                                                        |
| exit_timeout                     | 30                                                                       |
+----------------------------------+--------------------------------------------------------------------------+

Server Health: live 1, ready 1
Server Metadata:
{"name":"triton","version":"2.21.0","extensions":["classification","sequence","model_repository","model_repository(unload_dependents)","schedule_policy","model_configuration","system_shared_memory","cuda_shared_memory","binary_tensor_data","statistics","trace"]}
2022-06-09 12:54:35.981968507 [E:onnxruntime:log, cuda_call.cc:118 CudaCall] CUDNN failure 4: CUDNN_STATUS_INTERNAL_ERROR ; GPU=0 ; hostname=agrigaia-ws3-u ; expr=cudnnFindConvolutionForwardAlgorithmEx( s_.handle, s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.y_tensor, s_.y_data, 1, &algo_count, &perf, algo_search_workspace.get(), max_ws_size); 
2022-06-09 12:54:35.981998167 [E:onnxruntime:, sequential_executor.cc:346 Execute] Non-zero status code returned while running Conv node. Name:'Conv_0' Status Message: CUDNN error executing cudnnFindConvolutionForwardAlgorithmEx( s_.handle, s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.y_tensor, s_.y_data, 1, &algo_count, &perf, algo_search_workspace.get(), max_ws_size)
2022-06-09 12:54:35.982022932 [E:onnxruntime:log, cuda_call.cc:118 CudaCall] CUDA failure 700: an illegal memory access was encountered ; GPU=0 ; hostname=agrigaia-ws3-u ; expr=cudaEventRecord(current_deferred_release_event, static_cast<cudaStream_t>(GetComputeStream())); 
error: response status: Internal - onnx runtime error 1: Non-zero status code returned while running Conv node. Name:'Conv_0' Status Message: CUDNN error executing cudnnFindConvolutionForwardAlgorithmEx( s_.handle, s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.y_tensor, s_.y_data, 1, &algo_count, &perf, algo_search_workspace.get(), max_ws_size)

TRITON information I am working inside a docker dev container built on Triton-22.04 base image with ROS1 Noetic and Opencv 4.2.0 installed on top of it. nvcr.io/nvidia/tritonserver:22.04-py3 Ubuntu 20.04 Ros Noetic OpenCV 4.2.0 Model config file:

name: "YOLOv5nCOCO"
platform: "onnxruntime_onnx"
max_batch_size : 0
input [
  {
    name: "images"
    data_type: TYPE_FP32
    format: FORMAT_NCHW
    dims: [ 3, 512, 512 ]
    reshape { shape: [ 1, 3, 512, 512 ] }
  }
]
output [
  {
    name: "output"
    data_type: TYPE_FP32
    dims: [1, 16128, 85]
  }
]

Steps to reproduce I have written my main.cpp file based on your simple.cc example file. Unfortunately, I cannot share the full project Here are my preprocessing steps from receiving ros image topic till triton:

cv_ptr = cv_bridge::toCvCopy(msg, "rgb8");
      
// Store the values of the OpenCV-compatible image into the current_frame variable
cv::Mat current_frame = cv_ptr->image;

// Preprocessing of the image
// normalize the image and convert to 
current_frame.convertTo(current_frame, CV_32F, 1.0/255.0, 0);
//resize the image to model input size 
cv::resize(current_frame, current_frame, cv::Size(512, 512), cv::INTER_LINEAR);
// NCHW channels first
// cv::transpose(current_frame, current_frame);

// Convert Mat to Array/Vector in OpenCV https://stackoverflow.com/a/26685567
std::vector<float> input_data;
if (current_frame.isContinuous()) {
  input_data.assign(current_frame.data, 
    current_frame.data + current_frame.total()*current_frame.channels());
} else {
  for (int i = 0; i < current_frame.rows; ++i) {
    input_data.insert(input_data.end(), current_frame.ptr<float>(i), 
    current_frame.ptr<float>(i) + current_frame.cols * current_frame.channels());
  }
}

auto input = "images";
auto output = "output";                                                
size_t input_size = input_data.size() * sizeof(float);                                
const TRITONSERVER_DataType input_datatype = TRITONSERVER_TYPE_FP32;  
std::vector<int64_t> input_shape({current_frame.channels(), current_frame.rows, current_frame.cols}); 
const void* input_base = &input_data[0];

// Push data into Triton format
FAIL_IF_ERR(
    TRITONSERVER_InferenceRequestAddInput(
        irequest, input, input_datatype, &input_shape[0], input_shape.size()),
    "setting input meta-data for the request");

FAIL_IF_ERR(
    TRITONSERVER_InferenceRequestAppendInputData(
        irequest, input, input_base, input_size, requested_memory_type,
        0 /* memory_type_id */),
    "assigning INPUT data");
FAIL_IF_ERR(
      TRITONSERVER_InferenceRequestAddRequestedOutput(irequest, output),
      "requesting output for the request");

// Triton connection
auto p = new std::promise<TRITONSERVER_InferenceResponse*>();
std::future<TRITONSERVER_InferenceResponse*> completed = p->get_future();

FAIL_IF_ERR(
    TRITONSERVER_InferenceRequestSetResponseCallback(
        irequest, allocator, nullptr /* response_allocator_userp */,
        InferResponseComplete, reinterpret_cast<void*>(p)),
    "setting response callback");



FAIL_IF_ERR(TRITONSERVER_ServerInferAsync(server->get(), irequest, nullptr /* trace */),"running inference");

TRITONSERVER_InferenceResponse* completed_response = completed.get();

FAIL_IF_ERR(TRITONSERVER_InferenceResponseError(completed_response),"response status");
//<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Here is the error

I tried running the same triton server from its executable and my model repository over a gRPC call to check if the ONNX export is correct. Everything works smoothly via gRPC. Some google links suggested that it might be an issue on RTX 20 series but I also reproduced the same error on RTX 3070. I can narrow it down to something wrong with my input image data but the error handling does not really specify the problem. Some insights would help a lot where to look exactly.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

3reactions
tanmayv25commented, Jun 24, 2022

Not sure what the cause of this issue is. Filing a bug with the team to understand the failure.

0reactions
krishung5commented, Nov 3, 2022

@niqbal996 If requested_memory_type is set to GPU for the inputs, you would need to copy the input tensor to GPU to make the error go away. To do so, you can refer to this part of simple.cc to see how you need to modify the script.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Onnxruntime error - Jetson Nano - NVIDIA Developer Forums
Name:'Conv_441 ' Status Message: CUDNN error executing cudnnFindConvolutionForwardAlgorithmEx( s_.handle, s_.x_tensor, s_.x_data, s_.w_desc, ...
Read more >
CUDNN ERROR: Failed to get convolution algorithm
I've seen this error message for three different reasons, with different solutions: 1. You have cache issues. I regularly work around this ...
Read more >
NVIDIA - CUDA | onnxruntime
The CUDA Execution Provider enables hardware accelerated computation on Nvidia CUDA-enabled GPUs. Contents. Install; Requirements; Build; Configuration Options ...
Read more >
ONNX converted TensorFlow saved model runs on CPU but ...
... status code returned while running Relu node. Name:'FirstStageFeatureExtractor/resnet_v1_101/resnet_v1_101/conv1/Relu' Status Message: ...
Read more >
cuDNN Library
number of SMs. ... Numerical overflow occurred during the GPU kernel execution. ... cudnnConvolutionForward(), this status will represent that error.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found