Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`import kornia` break CUDA lazy init

See original GitHub issue

Describe the bug

Hi there. Thank you for providing a great library!

I found a bug related to CUDA initialization in kornia==0.6.8.

This bug happened using the “CUDA_VISIBLE_DEVICES” environment value in Python script.

Reproduction steps

Below is my workstation.

$ nvidia-smi -L
GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-749126bf-d321-4567-7c2d-6b03eced6bb6)
GPU 1: NVIDIA A100-SXM4-40GB (UUID: GPU-f326dc87-2766-a6cd-2c76-9d3860157f88)
GPU 2: NVIDIA A100-SXM4-40GB (UUID: GPU-feae10c6-1b2e-88e2-19a2-d1787eb645b6)
GPU 3: NVIDIA A100-SXM4-40GB (UUID: GPU-343edb27-9d92-2038-65fd-5596aff0bf33)
GPU 4: NVIDIA A100-SXM4-40GB (UUID: GPU-83bd83b6-f178-0b19-6216-10e8129a67ee)
GPU 5: NVIDIA A100-SXM4-40GB (UUID: GPU-4e238c07-cda8-5fdd-ae0b-b961e4bb0b29)
GPU 6: NVIDIA A100-SXM4-40GB (UUID: GPU-0388c595-fb53-1000-e6c3-b226f77b8a21)
GPU 7: NVIDIA A100-SXM4-40GB (UUID: GPU-fed61f6c-f856-e1ff-4705-539a72e61848)

Then, I try below script. This script is used to change the GPU in script.

import kornia
import os
import torch

os.environ["CUDA_VISIBLE_DEVICES"] = "32"

print(torch.__version__, torch.cuda.device_count())

Output is here.

$ python test.py
1.12.0+cu113 8

Expected behavior

The following output is correct.

$ python test.py
1.12.0+cu113 0

“CUDA_VISIBLE_DEVICES” environment value is used to switch or limit GPUs in process. NVIDIA reference is here This environment value is used below.

CUDA_VISIBLE_DEVICES="1,2" python train.py

import os

os.environ["CUDA_VISIBLE_DEVICES"]="1"

For example, YOLOv5 uses the following method for GPU selection. https://github.com/ultralytics/yolov5/pull/8497

The same bug also occurred in PyTorch and has been fixed.（My own Issue) https://github.com/pytorch/pytorch/issues/80876

I think the library should not initialize CUDA so that the user can control it. CUDA is initialized when torch.cuda is executed.

The bug is caused by this part.

https://github.com/kornia/kornia/blob/db988652b13f46ac0dfedc5f883e11a848a6ca07/kornia/feature/adalam/adalam.py#L31

https://github.com/kornia/kornia/blob/db988652b13f46ac0dfedc5f883e11a848a6ca07/kornia/utils/helpers.py#L12-L28

We can fix it like PyTorch official repo. https://github.com/pytorch/pytorch/pull/80899/files

- if torch.cuda.is_available():
+ # nvFuser imports are conditional on being compiled with CUDA
+ if hasattr(torch._C, "_nvfuser"):

If there is no problem, I will make a Pull Request as it is.

Environment

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py

PyTorch Version (e.g., 1.0):
OS (e.g., Linux):
How you installed PyTorch (conda, pip, source):
Build command you used (if compiling from source):
Python version:
CUDA/cuDNN version:
GPU models and configuration:
Any other relevant information:

Collecting environment information...
PyTorch version: 1.12.1+cu113
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.5 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.10.5 (main, Jul  6 2022, 16:47:12) [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.4.0-125-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.4.152
GPU models and configuration:
GPU 0: NVIDIA A100-SXM4-40GB
GPU 1: NVIDIA A100-SXM4-40GB
GPU 2: NVIDIA A100-SXM4-40GB
GPU 3: NVIDIA A100-SXM4-40GB
GPU 4: NVIDIA A100-SXM4-40GB
GPU 5: NVIDIA A100-SXM4-40GB
GPU 6: NVIDIA A100-SXM4-40GB
GPU 7: NVIDIA A100-SXM4-40GB

Nvidia driver version: 470.141.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.5.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.23.1
[pip3] torch==1.12.1+cu113
[pip3] torch-tb-profiler==0.4.0
[pip3] torchaudio==0.12.1+cu113
[pip3] torchvision==0.13.1+cu113
[conda] Could not collect

Additional context

No response

Issue Analytics

State:
Created a year ago
Comments:6

Top GitHub Comments

1reaction

johnnv1commented, Oct 19, 2022

Not related to the problem, but the @shijianjian suggestion also will improve the import time in a bit (from ~8ms to ~0.2ms)

0reactions

ducha-aikicommented, Oct 19, 2022

@shijianjian you are right. Let me fix that in adalam

Top Results From Across the Web

[1.12] os.environ["CUDA_VISIBLE_DEVICES"] has no effect

Ok, I think I know what is going on: we call torch.cuda.is_available() during import torch , which makes all os.environ calls inefficient.

Slow init of GPUs - deployment - PyTorch Forums

In case you are using CUDA 11.7+, you can activate lazy module loading via CUDA_MODULE_LOADING=LAZY , which will avoid pre-loading every kernel and...

LightningDataModule - PyTorch Lightning - Read the Docs

Here's a more realistic, complex DataModule that shows how much more reusable the datamodule is. import pytorch_lightning as pl from torch.utils.data import ......

CUDA Toolkit 12.0 Released for General Availability

Overview · NVIDIA Hopper and NVIDIA Ada Lovelace architecture support · Lazy loading · Compatibility · JIT LTO support · C++20 compiler support....

Pytorch-XLA: Understanding TPU's and XLA | Kaggle

Let's start with importing Pytorch-XLA and necessary Modules ... XLA tensors are Lazy and their internals differ from CPU and CUDA tensors.