Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Failing to load the pre-trained weights on multi-gpus - FasterRCNN example

See original GitHub issue

🐛 Describe the bug

Hey, I have an issue when running the FasterRCNN example given here: it seems that the backbone is being loaded on cuda:0 while the model itself is being distributed to multiple GPUs. I saw that this issue was mentioned before for other architectures: #1037, #1038, vision issue.

I believe that this is a similar issue but I’m not sure how to handle this case. Appreciate the help

Error logs

Installation instructions

Install torchserve from source: No Running in docker: Yes, inside this image: nvcr.io/nvidia/pytorch:21.02-py3

I clone the serve repo, run the install dependencies script, and then pip install torchserve.

Model Packaing

I use the built in handler: https://github.com/pytorch/serve/blob/master/ts/torch_handler/object_detector.py

config.properties

default

Versions

Environment headers

Torchserve branch:

torchserve==0.6.0 torch-model-archiver==0.6.0

Python version: 3.8 (64-bit runtime) Python executable: /opt/conda/bin/python

Versions of relevant python libraries: captum==0.5.0 future==0.18.2 numpy==1.23.0 nvgpu==0.9.0 psutil==5.9.1 pytest==6.2.2 pytest-cov==2.11.1 pytest-pythonpath==0.7.3 pytorch-transformers==1.1.0 requests==2.28.0 sentencepiece==0.1.95 torch==1.9.0+cu111 torch-model-archiver==0.6.0 torch-workflow-archiver==0.2.4 torchaudio==0.9.0 torchserve==0.6.0 torchserve-dashboard==0.5.0 torchtext==0.10.0 torchvision==0.10.0+cu111 wheel==0.37.1 torch==1.9.0+cu111 torchtext==0.10.0 torchvision==0.10.0+cu111 torchaudio==0.9.0

Java Version:

OS: N/A GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Clang version: N/A CMake version: version 3.19.4

Is CUDA available: Yes CUDA runtime version: 11.2.67 GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3090 GPU 1: NVIDIA GeForce RTX 3090 GPU 2: NVIDIA GeForce RTX 3090 GPU 3: NVIDIA GeForce RTX 3090 GPU 4: NVIDIA GeForce RTX 3090 GPU 5: NVIDIA GeForce RTX 3090 GPU 6: NVIDIA GeForce RTX 3090 GPU 7: NVIDIA GeForce RTX 3090 Nvidia driver version: 510.54 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.0

Repro instructions

Follow the steps here https://github.com/pytorch/serve/tree/master/examples/object_detector/fast-rcnn and run nvidia-smi in a different terminal

Possible Solution

No response

Issue Analytics

State:
Created a year ago
Comments:10

Top GitHub Comments

1reaction

agunapalcommented, Jul 28, 2022

Closing this. Please re-open if issue is not resolved

1reaction

agunapalcommented, Jul 7, 2022

@jonathan-ibex Thanks for checking. Will debug further and get back to you

Top Results From Across the Web

Training the Model - NVIDIA Documentation Center

Here's an example of using the tlt-train command: tlt-train classification -e ... FasterRCNN loads the pretrained weights by name.

02. Predict with pre-trained Faster RCNN models

This article shows how to play with pre-trained Faster RCNN model. First let's import some necessary libraries: from matplotlib import pyplot as plt...

Keras model trained with Multi GPUs not loading on non gpu ...

I have got the following error: ValueError: You are trying to load a weight file containing 0 layers into a model with 26...

Tensorflow Owner - Stack Exchange Data Explorer

'How can I visualize the weights(variables) in cnn in Tensorflow? ... 'Fail to run word embedding example in tensorflow tutorial with GPUs', ...

Getting Started — MMDetection 2.2.1 documentation

Currently the config files in cityscapes use COCO pre-trained weights to initialize. You could download the pre-trained models in advance if network is ......